Erlang Factory Lite Paris

Transcription

Erlang Factory Lite Paris
Erlang Factory Lite Paris
Lessons learned
How we use Erlang to analyze millions of messages per day
©Semiocast
A few words about us
What we do with Erlang
©Semiocast
A few words about us
S e m i o c a s t processes social media conversations to
provide analytics and market research insights
3HJYPZLKL.\LYSHPUZ\YSL^LI[LTWZYtLS
6J[VIYL·(UHS`ZLKL:LTPVJHZ[
» (\KtWHY[\U
WYVWVZYHJPZ[LKL17
.\LYSHPUSL]LUKYLKP
VJ[VIYLn
OZ\Y-YHUJL
» 3LI\aaZLKtWSHJL
WL\nWL\K\YLQL[KL
S»OVTTLnSHJYP[PX\L
KLSHTHYX\LIV`JV[[
THUPMLZ[H[PVUK\
VJ[VYIYL
TVKPÄJH[PVUKLZ
PU[LU[PVUZK»HJOH[
» 3HJYPZLLZ[
S»VJJHZPVUKLMHPYLSL
SPLUH]LJSLNYV\WL
UVTIYLZPNUPÄJH[PMKL
JVVJJ\YYLUJLZ3=4/
L[.\LYSHPU
» 3HJVU[LZ[H[PVUZ\Y
-HJLIVVRZLTvSL
H]LJSHKtYPZPVUL[
SHNLZ[PVUKLJYPZL
Z\YJLTtKPHLZ[[YuZ
JYP[PX\tL
» 3»PU[LYUH[PVUHSPZH[PVU
KLSHJYPZLYLZ[LSPTP[tL
THPZS»PUMVYTH[PVUZL
KPMM\ZLH\1HWVULU
ÄUKLWtYPVKLS\UKP
VJ[VIYL
*OYVUVSVNPLKLZt]tULTLU[Z
5VTIYLKLTLZZHNLZWHYOL\YL
=L
:H
+P
3\
4H
4L
1L
=L
=LO
7YVWVZKL
17.\LYSHPU
*VVJJ\YYLUJLZ
:H
+P
3\
=L O
,_J\ZLZKL
17.\LYSHPU
4H
4L
1L
=L
:H
:HO
9tHJ[PVUKLSHZVJPt[t
.\LYSHPUZ\Y-HJLIVVR
+P
3\
3\O
iKP[VKL
7\S]HY
4H
4L
1L
=L
:H
+P
=L O
*VTT\UPX\tKLSHZVJPt[t
.\LYSHPUZ\Y-HJLIVVR
3\
1V\Y
VJ[VIYL
:HO
4HUPMLZ[H[PVU
H\_*OHTWZiS`ZtLZ
')700&0/2'#-4+.'7'$/'4702,3
;OuTLZWHYWtYPVKLZJStZL[L_LTWSLZKLTLZZHNLZ
0»TSV]PUN[OL.\LYSHPU(SSLNVYPHZJLU[
:PUVU n TVU H]PZ .\LYSHPU LZ[ TPL\_ 1»HP IPLU HPTt 0K`SSL TvTL
ZPWS\ZPL\YZSLKt[LZ[LL[4P[ZV\RV###:OHSPTHYTL]HH\ZZP
5PJLTHRL\WI`N\LYSHPU
:V ^HU[ [V [Y` [OL UL^ .\LYSHPU 3PUNLYPL KL 7LH\ MV\UKH[PVU HM[LY
^H[JOPUN'=P]PHUUH4HRL\WYH]LHIV\[OV^SV]LS`P[^HZ!
:V\UKZ SV]LS` 9; 'SLZSPLKHUPLSZ! :6;+ .\LYSHPU 0UZVSLUJL ,+;
-Y\P[`-SVYHS^P[O-YLZO9LKILYYPLZ
3H¸JVTT\ULSSLKLYVZLI\SNHYL¹.\LYSHPUUVTKLJL[HTHSNHTL
J»LZ[SL7L[Y\ZKLSHWHYM\TLYPLZV\YJL!S»,_WYLZZ:[`SLZ
.\LYSHPU»Z (ILPSSL 9V`HSL >PSS P[ YLHSS` SPM[ ÄYT HUK ^YPURSL JVYYLJ[
T`ZRPU&
')PN-HZOPVUPZ[H0»SSNL[P[Q\Z[HZP[»ZKH`Z^VY[OMYLLMV\UKH[PVUI\[
]LY`OHWW`^P[O.\LYSHPUSPUNLYPLKLWLH\P[PZHTHaPUN
'KLSNVMMHZ[\JPL\_SLTHX\PSSHNL.\LYSHPUKHUZ\ULIVP[LKLWLUUL
VJ[VIYL
9HJPZTL
(\[YLZ
(]PZZ\YSLZWYVK\P[Z
7\IZWYVK\P[Z
5uNYL
)V`JV[[
4HX\PSSHNLZ
7HYM\TZ
i]tULTLU[Z
3=4/
*VTTLU[HPYLZZ\YSPUMVYTH[PVU
9LWYPZLKLSPUMVYTH[PVU
7\IZWYVK\P[Z
1V\YVJ[
'9+)52'3
02'4*#/
.'33#)'3#$054')700&+/5)534
')700&-'#&'2+/.+%20.'33#)'%0/6'23#4+0/3
)2074*+/.'33#)'3#$054')700&
')700&
')700&#$2#/&$'-06'&
20/,
$#-#/%'&12'3'/%'7+4*+/2'#-4+.'7'$/#4+0/3
')700&%-03'3440--+340/(#24*'344020/,#/&''4'2
--+340/
-#6023
#2-97+/'
''4'2
'33#)'3#$054')700&#/&04*'2$2#/&3&52+/)5)534
5.$'20(.'33#)'31'2&#9#--$2#/&3'8%'14-#6023
5.$'20(.'33#)'31'2&#9(02-#6023
VJ[VIYL
4HUPMLZ[H[PVU
5)534
< 5..#29
,_J\ZLZ
!7+44'2#%'$00,"'+$0-52,;
345&9$9 '.+0%#34
5VTIYLKLTLZZHNLZWHYQV\YTLU[PVUUHU[
SHTHYX\L.\LYSHPU
L[SL[OuTL
(WWLSH\IV`JV[[
+tYPZPVU
*YP[PX\LKLSHTHYX\L.\LYSHPU
17.\LYSHPU
17.\LYSHPU0SU»HQTZMHP[X\LSX\LJOVZLK»H\ZZPUH\ZtHIVUK
.\LYSHPUSLWHYM\TX\PW\LSLYHJPZTL
1LHU7H\S .\LYSHPU KL]YHP[ WYLUKYL \U IVU UuNYL WV\Y ZH
JVTT\UPJH[PVUKLJYPZL
*LZ[SHKLYUPLYLX\LQ»HJOL[L.\LYSHPU]H[LMHPYLMV\[YL
3LWHYM\TL\YX\P]P[LUJVYLH\[LTWZK\JVSVUPHSPZTLIL\YRRRRWS\Z
QHTHPZQLU»HJOu[LYHPZ\UWYVK\P[.\LYSHPU
3LZ WHYM\TZ KL .\LYSHPU ZLU[LU[ IVU THPZ S»OtYP[PLY .\LYSHPU W\L
]YHPTLU[
-<*2TPZ[LYN\LYSHPUMYVT[OLIV[[VTVMT`OLHY[
4LZZHNLH[V\ZSLZ¸UuNYLZ¹IV`JV[[LY[V\ZSLZWYVK\P[Z.<,93(05
3LZWYVWVZKL1LHU7H\S.\LYSHPUOtYP[PLYK\NYV\WLKLS\_LZ\YSL
[YH]HPSKLZ¸UuNYLZ¹ZVU[PUQ\YPL\_
#90(.0/4*5)534
0U[LYUH[PVUHSPZH[PVUKLSHJYPZL
+PZ[YPI\[PVUNtVNYHWOPX\LKLZHWWLSZH\IV`JV[[
')700&
VJ[VIYL
*VTTLU[HPYLZZ\YSPUMVYTH[PVU
9LWYPZLKLSPUMVYTH[PVU
7\IZWYVK\P[Z
-YHUJL
9<
i<
:\PZZL
(WWLSH\IV`JV[[
+tYPZPVU
*YP[PX\LKLSHTHYX\L.\LYSHPU
17.\LYSHPU
*HUHKH
9LZ[L
K\TVUKL
.\LYSHPULUJVTT\UPJH[PVUKLJYPZLZ\Y-HJLIVVRPSt[HP[[LTWZ
)V`JV[[LY.<,93(059+=Z\YSLZ*OHTWZ-HP[LZWHZZLY
(3,9;,(<9(*0:4,!1»PU]P[L[V\[LZWLYZVUULZZLZLU[HU[JVUJLYUt
WHY SL YHJPZTL JVUKLZJLUKHU[ KL 49 .<,93(05 H IV`JV[[t JL
WHYMHP[ PKPV[ HPUZP [V\Z SLZ HY[PJSLZ KL *VZTt[PX\L MHIYPX\t WHY SH
ZVJPt[t KL JL KLYUPLY :P [\ LZ UVPY L[ X\L [\ WVZZuKL \U WL\ KL
ÄuY[t JVWPL JL Z[H[\[ KHUZ Z\Y [H WHNL MHJLIVVR L[ MHP[ [V\YULY SH
]PKtVX\PLZ[Z\YTVUWYVÄS
:\Y-HJLIVVRJLZKLYUPLYZQV\YZVUWL\[VIZLY]LYSHJO\[LK\T\Y
KL.\LYSHPU
1»HWWLSSL H\ IV`JV[[ [V[HS KLZ WYVK\P[Z .\LYSHPU X\P ¸W\LU[¹ SL
YHJPZTLWYPTHPYL
N\LYSHPUZHSLIH[HYKKLYHZ[VUPSMH\[IV`JV[[LYJLZWYVK\P[ZKLTLYKL
L[SLZTLKPHX\PWHZZLJLZWYVWVZKLYHJPZ[LZV\ZZPSLUJL
3LIV`JV[[KL.\LYSHPUZ»VYNHUPZLZ\Y-HJLIVVR
(KPL\¸0UZ[HU[4HNPX\L¹KL.\LYSHPUQLIV`JV[[L
20/,
--+340/
#2-97+/'
'/4+.'/40(.'33#)'3#$054')700&
*#2'0(.'33#)'3
03+4+6'.'33#)'3
#/)5#)'30(.'33#)'3#$054')700&
/)-+3*
0245)5'3'
*+/'3'
#1#/'3'
')#4+6'.'33#)'3
1#/+3*
2'/%*
4#-+#/
54%*
#90(.0/4*5)534
/&0/'3+#/
-#6023
''4'2
')700&
:LTPVJHZ[
©Semiocast
3HJYPZLKL.\LYSHPUZ\YSL^LI[LTWZYtLS
ZLTPVJHZ[JVT
'.+0%#34:
')700&0/2'#-4+.'7'$/'4702,35)534
34+.#4'&/'4702,3#6'2#)'!7+44'2#%'$00,"'+$0;
3'.+0%#34%0.
Semiocast’s offer
Quantitative studies
ad hoc
Qualitative studies
ad hoc
Monitoring
Tools
©Semiocast
Barometers
Shares of social media conversations
Sentiment analysis and clustering
Topic identification
Ad hoc quantitative indicators
Consumer insights
Verbatim research
Clustering of conversations
Mapping of communities and influence analysis
Enumeration of social media conversation spaces
Real-time alerts
Daily/Weekly/Monthly reports
Crisis monitoring
Social media monitoring platform (Semioboard)
Technology as a service (API)
Semioboard
©Semiocast
Semioboard
©Semiocast
Live analysis of comments on TV debates
©Semiocast
How we ended up using
Erlang
©Semiocast
How we ended up using Erlang
Discovered Erlang when getting WiFi rabbits to
talk to each other over XMPP (ejabberd) in 2007
Taught OCaml in 2004
Three reasons why we chose Erlang :
- hot code change and inspection
- fault-tolerance
- happy to do functional programming
(gave us a break from Java and C++)
©Semiocast
How we use Erlang
1352 OTP releases
- 47 applications
{release, {"Semiocast OTP", "1352"}, {erts, "5.8.4"},
[
% erts 5.8.4
{kernel, "2.14.4"},
{stdlib, "1.17.4"},
{mnesia, "4.4.18"},
{inets, "5.5.2"},
{sasl, "2.1.9.3"},
{crypto, "2.0.2.1"},
{snmp, "4.19"},
{otp_mibs, "1.0.6"},
{ssl, "4.1.5"},
{public_key, "0.12"},
{xmerl, "1.2.8"},
{compiler, "4.7.3"},
{runtime_tools, "1.8.5"},
{syntax_tools, "1.6.7"},
- 100K lines of Erlang (without tests)
% Other libs
{erlsom, "1.2.1"},
{mochiweb, "0.167.10"},
{nitrogen, "2.0.20100531.14"},
{nprocreg, "0.1"},
{simple_bridge, "1.0.2"},
- 11K lines of C/C++ (mostly glue)
% Semiocast
{analyzer, "22", load},
{alien_models, "50", load},
{alien_uniform_streams, "7", load},
{api_server, "109", load},
{aspell, "4", load},
{binlog, "26", load},
{certificate_authority, "8", load},
{chasen, "11", load},
{chinese_segmenter, "1", load},
{commonlib, "250", permanent}, % always
{ctl, "29", permanent},
% always
{dashboard_engine, "55", load},
{dashboard_storage, "56", load},
{dashboard_website, "226", load},
{developer_website, "36", load},
{engine, "455", load},
{gate, "79", load},
{geodb, "5", load},
{geoip, "2", load},
{image_magick, "8", load},
{kdtree, "3", load},
{kqueue, "1", load},
{link_grammar_parser, "12", load},
{memcached, "5", load},
{mysql, "8", load},
{nagios, "6", permanent},
% always
{opennlp, "8", load},
{pgsql, "2", load},
{pubsubhubbub, "4", load},
{qr_website, "1", load},
{re2, "1", load},
{s_http, "42", load},
{setproctitle, "2", permanent}, % always
{sink, "15", load},
{sqlite, "9", load},
{storage, "226", load},
{svg, "12", load},
{svm, "1", load},
{text_ident, "27", load},
{text_proc, "62", load},
{titema_website, "7", load},
{url_server, "8", load},
{uuid, "6", load},
{web_common, "14", load},
{web_gate, "8", load},
{web_storage, "5", load},
{wikimedia, "6", load}
- 1K lines of Java (glue)
~50 ungraduated applications for
- prototyping
- short lived projects
- web-based/command line tools
that run on dev machines
©Semiocast
]}.
start commonlib.
start ctl.
start nagios.
start setproctitle.
A few things we wish we had known about Erlang
A few things we wish we
had known about Erlang
©Semiocast
A few things we wish we had known about Erlang
Mistake #1:
Creating an erlang process to do a lot of work
- processes should spend most of their time
waiting for messages (gen_server), or do some
intensive work and quickly exit when done
(spawn_link)
- when required, benchmark, as message passing
with the worker process can prove expensive
Self = self(),
spawn_link(fun()
-> Self ! {language, analyze_language(Text, MD0, Mode)} end),
Self
= self(),
spawn_link(fun() ->
-> Self
Self !! {language,
{location, analyze_language(Text,
analyze_location(MD0, MD0,
Mode)}
end),end),
spawn_link(fun()
Mode)}
{NProcessedLang, ->
Language}
= receive {language,
RespLa} ->Mode)}
RespLaend),
end,
spawn_link(fun()
Self ! {location,
analyze_location(MD0,
{NProcessedLoc, Language}
Location} == receive
receive {language,
{location, RespLa}
RespLoc}->
->RespLa
RespLocend,
end,
{NProcessedLang,
{NProcessedLoc, Location} = receive {location, RespLoc} -> RespLoc end,
Faster
{NProcessedLang, Language} = analyze_language(Text, MD0, Mode),
{NProcessedLoc, Language}
Location} == analyze_language(Text,
analyze_location(MD0, Mode),
{NProcessedLang,
MD0, Mode),
{NProcessedLoc, Location} = analyze_location(MD0, Mode),
©Semiocast
A few things we wish we had known about Erlang
Mistake #2:
Creating a lot of processes for parallelized
computing
- having more worker processes than schedulers
does not make sense
- it can actually hurt, because processes waiting
for a reply may not have it in time and will fail
with a timeout
©Semiocast
Thinking OTP
Thinking OTP
©Semiocast
Thinking OTP
Mistake #3:
Starting processes outside the supervision tree
- gen_server & co. should be used everywhere,
except for very short lived processes (that won’t
be upgraded)
- Every gen_server should be started from a
supervisor
- A real-world supervision design requires
process_flag(trap_exit, true),
monitor/2 and link/1, as well as some
thinking
©Semiocast
%%---------------------------------------------------------------%% @doc supervisor init callback.
%%
-spec(init/1::(any()) -> sup_init()).
init(_Args) ->
UserServerSupSpec = {user_server_sup, % id
{user_server_sup, start_link, []}, % init function
permanent, % restart children that crash
?SHUTDOWN_DELAY, supervisor,
[user_server_sup] % modules
},
AlienUserServerSupSpec = {alien_user_server_sup, % id
{alien_user_server_sup, start_link, []}, % init function
permanent, % restart children that crash
?SHUTDOWN_DELAY, supervisor,
[alien_user_server_sup] % modules
},
MailboxServerSupSpec = {mailbox_server_sup, % id
{mailbox_server_sup, start_link, []}, % init function
As a rule of thumb
Modules
should children
be a list
with
permanent,
% restart
that
crash
one element [Module],
where Module
is the callback
?SHUTDOWN_DELAY,
supervisor,
modules
module, if the [mailbox_server_sup]
child process is a %supervisor,
},
gen_server or gen_fsm
UserManagerStartChildSemaphore = {user_manager_start_child_sem
erl -man [{local,
supervisor
{semaphore, start_link,
user_manager_start_child_
transient, % restart children that crash
gen_manager is a
?SHUTDOWN_DELAY, worker,
gen_server with a callback module
[semaphore] % modules
},
handling code_change messages
UserManagerSpec = {user_manager, % id
(here, user_manager)
{user_manager, start_link, []}, % init function
transient, % restart children that crash
?SHUTDOWN_DELAY, worker,
[gen_manager, user_manager] % modules
},
RestartStrategy = {one_for_one, ?MAX_RESTARTS, ?MAX_RESTARTS_P
% Setup snmp probe for semaphore.
gen_snmp_agent:set_variable(?SNMP_USER_MANAGER_START_CHILD_SEM
©Semiocast
Thinking OTP
Mistake #4:
Thinking obscure comments in the documentation
do not really apply
- When in doubt, source code is handy, and helps
figuring out when we really need to go off the rule
Thinking OTP
Mistake #5:
Putting everything in a single virtual machine (node) per
server
- Virtual machines may crash
- Code changes can fail and take down the whole
node
- It’s better to separate critical code
-
even per server if a crashing node can take a huge
amount of RAM and make other nodes swap
©Semiocast
Foreign code
Interfacing with foreign code
©Semiocast
Foreign code
Six ways to interface foreign code with Erlang:
- Linked-In drivers
- External drivers
- NIFs
- os:cmd/1
- C-based distributed node
- Java-based distributed node (jinterface)
We tried them all…
…and are looking forward to future extensions to the
native interface (R15?)
©Semiocast
Foreign code
Method
Linked-in drivers
External drivers
NIFs
os:cmd/1
C-based distributed
node
jinterface
©Semiocast
We use/used for
- aspell
- kqueue (FreeBSD/MacOS X kqueue binding)
- SQLite
- ImageMagick
- GeoIP
- WebKit
- uuid
- re2 (linear time bound replacement for re)
- bzip2
- OpenSSL
- Batik (svg rasterizer in Java)
- ruby (we actually bound Rails websites with Erlang at some point)
- OpenNLP
Foreign code
Mistake #6:
Using linked-in drivers for open source code that
could crash/abort
E.g.: ImageMagick will abort on bad input
- External drivers are more suitable when external
library is large, crash-prone or could leak
(sometimes, the leak is in the glue…)
- Passing pointers is possible but requires some
logic, typically binding a pointer to the port
©Semiocast
Foreign code
Mistake #7:
Using linked-in drivers for I/O intensive code
E.g.: sqlite
- Linked-in driver code is executed within a
scheduler thread. Running for too long will starve
other processes that will timeout, waiting for
messages
- Theoretically, we can use async threads (and we
do with sqlite). However, enabling async threads
(+A) has a huge impact on built-in I/O drivers
- Performance could be worse with external
drivers
©Semiocast
Erlang technologies we love hate
Erlang technologies we love
©Semiocast
Erlang technologies we love hate
dialyzer
- part of our compilation cycle
- found many bugs, typically inconsistencies between
callers and callees
- starting with -spec when defining exported funs
We wish it would be fixed/improved:
- horribly slow
- sometimes blind
- hard to understand
- fails on code_change code
- useless warnings that cannot be disabled
©Semiocast
Erlang technologies we love hate
snmp
- makes it really easy to integrate erlang nodes within a
monitoring solution (we use nagios and munin)
HiPE
OTP installed with --enable-native-lib and all our code
compiled with +native
- helps with CPU-bound work (including dialyzer…)
- most patches we submitted were HiPE-related
We are also grateful to the authors of:
erlsom, mochiweb, nitrogen, zotonic…
©Semiocast
Thank you !
paul@semiocast.com
©Semiocast

Similar documents