한 10년전쯤인가, 그때는 PHP + MySQL등등으로 RDB전성시대였었던것 같은데(물론 지금은 어떤지 알수 없다..) 최근에는 여기 저기서 NoSQL DB를 많이 사용하고 있는것 같다. 대용량 데이터를 처리하고 다뤄야 하는 데이터 싱크 용도로는 Cassandra 같은 분산 DB가 단연 최고이다.


오늘은 그래서 Cassandra를 한번 설치해봤다. Cassandra가 뭐고, 어떻게 동작하는 것인지는 설치부터 해놓고 이것 저것 만들어가면서 설명해도 모지라지 않으니 일단은 설치부터 해보자.




설치 및 실행 방법


1. 설치하는 방법

1. 다운로드 받기 (http://cassandra.apache.org/download/)

cassandra@ubuntu:~$ wget http://apache.mirror.cdnetworks.com/cassandra/1.1.6/apache-cassandra-1.1.6-bin.tar.gz


2. 압축풀기

cassandra@ubuntu:~$ tar xvfz apache-cassandra-1.1.6-bin.tar.gz

cassandra@ubuntu:~$ mkdir bin

cassandra@ubuntu:~$ mv apache-cassandra-1.1.6 ./bin

cassandra@ubuntu:~$ cd bin

cassandra@ubuntu:~/bin$ ln -s apache-cassandra-1.1.6 cassandra


3. 로그 설정이라던지, 각종 설정 변경하기는 아래 두개의 파일을 수정하면 됨.

cassandra@ubuntu:~/bin$ cd cassandra

cassandra@ubuntu:~/bin/cassandra$ cd conf

cassandra.yaml --> 기본적인 설정들

log4j-server.properties --> 로그 설정들.


2. 실행하는 방법 

cassandra@ubuntu:~$ bin/cassandra/bin/cassandra

xss =  -ea -javaagent:bin/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms753M -Xmx753M -Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss180k

 INFO 07:44:21,693 Logging initialized

 INFO 07:44:21,697 JVM vendor/version: Java HotSpot(TM) Client VM/1.6.0_37

 INFO 07:44:21,697 Heap size: 779091968/780140544

 INFO 07:44:21,698 Classpath: bin/cassandra/bin/../conf:bin/cassandra/bin/../build/classes/main:bin/cassandra/bin/../build/classes/thrift:bin/cassandra/bin/../lib/antlr-3.2.jar:bin/cassandra/bin/../lib/apache-cassandra-1.1.6.jar:bin/cassandra/bin/../lib/apache-cassandra-clientutil-1.1.6.jar:bin/cassandra/bin/../lib/apache-cassandra-thrift-1.1.6.jar:bin/cassandra/bin/../lib/avro-1.4.0-fixes.jar:bin/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:bin/cassandra/bin/../lib/commons-cli-1.1.jar:bin/cassandra/bin/../lib/commons-codec-1.2.jar:bin/cassandra/bin/../lib/commons-lang-2.4.jar:bin/cassandra/bin/../lib/compress-lzf-0.8.4.jar:bin/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:bin/cassandra/bin/../lib/guava-r08.jar:bin/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:bin/cassandra/bin/../lib/jackson-core-asl-1.9.2.jar:bin/cassandra/bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/cassandra/bin/../lib/jamm-0.2.5.jar:bin/cassandra/bin/../lib/jline-0.9.94.jar:bin/cassandra/bin/../lib/json-simple-1.1.jar:bin/cassandra/bin/../lib/libthrift-0.7.0.jar:bin/cassandra/bin/../lib/log4j-1.2.16.jar:bin/cassandra/bin/../lib/metrics-core-2.0.3.jar:bin/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:bin/cassandra/bin/../lib/slf4j-api-1.6.1.jar:bin/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:bin/cassandra/bin/../lib/snakeyaml-1.6.jar:bin/cassandra/bin/../lib/snappy-java-1.0.4.1.jar:bin/cassandra/bin/../lib/snaptree-0.1.jar:bin/cassandra/bin/../lib/jamm-0.2.5.jar

 INFO 07:44:21,699 JNA not found. Native methods will be disabled.

 INFO 07:44:21,710 Loading settings from file:/home/cassandra/bin/apache-cassandra-1.1.6/conf/cassandra.yaml

 INFO 07:44:21,826 32bit JVM detected.  It is recommended to run Cassandra on a 64bit JVM for better performance.

 INFO 07:44:21,826 DiskAccessMode 'auto' determined to be standard, indexAccessMode is standard

 INFO 07:44:22,008 Global memtable threshold is enabled at 248MB

 INFO 07:44:22,239 Initializing key cache with capacity of 37 MBs.

 INFO 07:44:22,249 Scheduling key cache save to each 14400 seconds (going to save all keys).

 INFO 07:44:22,253 Initializing row cache with capacity of 0 MBs and provider org.apache.cassandra.cache.SerializingCacheProvider

 INFO 07:44:22,258 Scheduling row cache save to each 0 seconds (going to save all keys).

 INFO 07:44:22,353 Opening /home/cassandra/data/system/Versions/system-Versions-hf-1 (247 bytes)

 INFO 07:44:22,379 Opening /home/cassandra/data/system/LocationInfo/system-LocationInfo-hf-6 (346 bytes)

 INFO 07:44:22,385 Opening /home/cassandra/data/system/LocationInfo/system-LocationInfo-hf-5 (163 bytes)

 INFO 07:44:22,445 Couldn't detect any schema definitions in local storage.

 INFO 07:44:22,446 Found table data in data directories. Consider using the CLI to define your schema.

 INFO 07:44:22,467 completed pre-loading (2 keys) key cache.

 INFO 07:44:22,596 Replaying /home/cassandra/commitlog/CommitLog-1353080634664.log, /home/cassandra/commitlog/CommitLog-1353080634663.log

 INFO 07:44:22,606 Replaying /home/cassandra/commitlog/CommitLog-1353080634664.log

 INFO 07:44:22,615 Finished reading /home/cassandra/commitlog/CommitLog-1353080634664.log

 INFO 07:44:22,616 Replaying /home/cassandra/commitlog/CommitLog-1353080634663.log

 INFO 07:44:22,655 Finished reading /home/cassandra/commitlog/CommitLog-1353080634663.log

 INFO 07:44:22,687 Enqueuing flush of Memtable-Versions@19288329(83/103 serialized/live bytes, 3 ops)

 INFO 07:44:22,692 Writing Memtable-Versions@19288329(83/103 serialized/live bytes, 3 ops)

 INFO 07:44:22,742 Completed flushing /home/cassandra/data/system/Versions/system-Versions-hf-2-Data.db (247 bytes) for commitlog position ReplayPosition(segmentId=1353080662588, position=0)

 INFO 07:44:22,750 Log replay complete, 3 replayed mutations

 INFO 07:44:22,777 Cassandra version: 1.1.6

 INFO 07:44:22,777 Thrift API version: 19.32.0

 INFO 07:44:22,780 CQL supported versions: 2.0.0,3.0.0-beta1 (default: 2.0.0)

 INFO 07:44:22,846 Loading persisted ring state

 INFO 07:44:22,849 Starting up server gossip

 INFO 07:44:22,856 Enqueuing flush of Memtable-LocationInfo@29499086(29/36 serialized/live bytes, 1 ops)

 INFO 07:44:22,859 Writing Memtable-LocationInfo@29499086(29/36 serialized/live bytes, 1 ops)

 INFO 07:44:22,871 Completed flushing /home/cassandra/data/system/LocationInfo/system-LocationInfo-hf-7-Data.db (80 bytes) for commitlog position ReplayPosition(segmentId=1353080662588, position=363)

 INFO 07:44:22,892 Starting Messaging Service on port 7000

 INFO 07:44:22,904 Using saved token 77712367279614969246272394525491308416

 INFO 07:44:22,911 Enqueuing flush of Memtable-LocationInfo@12254719(53/66 serialized/live bytes, 2 ops)

 INFO 07:44:22,917 Writing Memtable-LocationInfo@12254719(53/66 serialized/live bytes, 2 ops)

 INFO 07:44:22,944 Completed flushing /home/cassandra/data/system/LocationInfo/system-LocationInfo-hf-8-Data.db (163 bytes) for commitlog position ReplayPosition(segmentId=1353080662588, position=544)

 INFO 07:44:22,959 Node master/192.168.0.20 state jump to normal

 INFO 07:44:22,960 Bootstrap/Replace/Move completed! Now serving reads.


3.끝내는 방법

cassandra@ubuntu:~$ pkill -f 'java.*cassandra'

쉽다 너무..@.@


다음번에는 Cassandra의 Key,Column,Value에 대해서 설명하고 사용하는 것에 대해서 이야기하겠습니다.


블로그 이미지

rekun,ekun 커뉴

이 세상에서 꿈 이상으로 확실한 것을, 인간은 가지고 있는 것일까?

얼마전 윈도우 환경하에서 VM웨어로 리눅스를 설치하고 싱글노드 Hadoop을 설치하고 테스트 한적이 있다. 

Hadoop은 여러 노드들을 붙여서 분산처리하기 위해서 나온 것인데, 환경이 안되다 보니 공부를 순수하게 학습 목적으로 테스트하였는데, 벌써부터 기억이 가물거려 그동안 봉인후 꺼내보지 않았던, 아주 오래된 노트북에 우분투를 설치하고, 데스크탑에 VM 리눅스환경에서 2 node hadoop을 설치하였다.


아래는 다음에 설치하면 까먹지 않기 위한 설치방법에 대한 정리이다.


우선 설치가 완성된후 jps를 돌린 상태 스샷.

1. master node(master,slave02)




2. slave01 node




host파일 내용은 아래와 같이 모든 노드에 동일하게 사용하여야 한다.(데스크탑을 사용하는 master node는 slave02의 역활도 함)

192.168.0.20    master

192.168.0.21    slave01

192.168.0.20    slave02



설치 방법

1. 준비물



2. hadoop용 계정 추가

$sudo adduser hadoop

 


3. hadoop이 사용할 디렉토리 구조 만들기

/home/hadoop/temp --> temp 용도의 디렉토리(hadoop을 실행하게 되면 map,reduce 하는 과정에 사용할  temp용도의 디렉토리)


4. ssh 키생성후 authorized_keys 로 등록하기(비밀번호 입력없이 바로 접속할수 있다)

hadoop@ubuntu:~$ssh-keygen -t rsa -P ""

hadoop@ubuntu:~$cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

hadoop@ubuntu:~$scp ~/.ssh/authorized_keys hadoop@[slave 서버들]:~/.ssh/


5. hadoop package 다운로드후 설치(압축풀기)

hadoop@ubuntu:~$tar xvfz hadoop-1.0.4.tar.gz

hadoop@ubuntu:~$mkdir bin

hadoop@ubuntu:~$mv hadoop-1.0.4 ./bin/

hadoop@ubuntu:~$cd bin

hadoop@ubuntu:~$ln -s hadoop-1.0.4 hadoop


6. JAVA-JDK설치하기

hadoop@ubuntu:~$./jdk-6u37-linux-i586.bin

hadoop@ubuntu:~$sudo mv jdk1.6.0_37 /usr/local/

hadoop@ubuntu:~$cd /usr/local

hadoop@ubuntu:~$sudo chown -R root:root /usr/local/jdk1.6.0_37

hadoop@ubuntu:~$sudo ln -s jdk1.6.0_37 java-6-sun


7. hadoop 환경 설정 파일 수정해주기

1) java home 설정해주기(hadoop-env.h)

hadoop@ubuntu:~/bin/hadoop/conf$ cat hadoop-env.sh 

# Set Hadoop-specific environment variables here.


# The only required environment variable is JAVA_HOME.  All others are

# optional.  When running a distributed configuration it is best to

# set JAVA_HOME in this file, so that it is correctly defined on

# remote nodes.


# The java implementation to use.  Required.

export JAVA_HOME=/usr/local/java-6-sun


2)각종 site 파일들 수정해주기(core-site.xml, hdfs-site.xml, mapred-site.xml)


hadoop@ubuntu:~/bin/hadoop/conf$ cat core-site.xml 

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://master:10001</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/temp</value>

</property>

</configuration>


hadoop@ubuntu:~/bin/hadoop/conf$ cat hdfs-site.xml 

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>


hadoop@ubuntu:~/bin/hadoop/conf$ cat mapred-site.xml 

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>

<property>

<name>mapred.job.tracker</name>

<value>master:10002</value>

</property>

</configuration>


8. hadoop format 하기 

hadoop@ubuntu:~/bin/hadoop$ bin/hadoop namenode -format

12/11/07 06:32:29 INFO namenode.NameNode: STARTUP_MSG: 

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = ubuntu/127.0.1.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 1.0.4

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012

************************************************************/

12/11/07 06:32:29 INFO util.GSet: VM type       = 32-bit

12/11/07 06:32:29 INFO util.GSet: 2% max memory = 19.33375 MB

12/11/07 06:32:29 INFO util.GSet: capacity      = 2^22 = 4194304 entries

12/11/07 06:32:29 INFO util.GSet: recommended=4194304, actual=4194304

12/11/07 06:32:30 INFO namenode.FSNamesystem: fsOwner=hadoop

12/11/07 06:32:30 INFO namenode.FSNamesystem: supergroup=supergroup

12/11/07 06:32:30 INFO namenode.FSNamesystem: isPermissionEnabled=true

12/11/07 06:32:30 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

12/11/07 06:32:30 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

12/11/07 06:32:30 INFO namenode.NameNode: Caching file names occuring more than 10 times 

12/11/07 06:32:31 INFO common.Storage: Image file of size 112 saved in 0 seconds.

12/11/07 06:32:31 INFO common.Storage: Storage directory /home/hadoop/temp/dfs/name has been successfully formatted.

12/11/07 06:32:31 INFO namenode.NameNode: SHUTDOWN_MSG: 

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1

************************************************************/


9. hadoop start(반드시 master 노드에서 시작)

hadoop@ubuntu:~/bin/hadoop/bin$ ./start-all.sh 

starting namenode, logging to /home/hadoop/bin/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-namenode-ubuntu.out

slave02: starting datanode, logging to /home/hadoop/bin/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-datanode-ubuntu.out

slave01: starting datanode, logging to /home/hadoop/bin/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-datanode-nuke-Satellite-A10.out

master: starting secondarynamenode, logging to /home/hadoop/bin/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-secondarynamenode-ubuntu.out

starting jobtracker, logging to /home/hadoop/bin/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-jobtracker-ubuntu.out

slave02: starting tasktracker, logging to /home/hadoop/bin/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-tasktracker-ubuntu.out

slave01: starting tasktracker, logging to /home/hadoop/bin/hadoop-1.0.4/libexec/../logs/hadoop-hadoop-tasktracker-nuke-Satellite-A10.out

hadoop@ubuntu:~/bin/hadoop/bin$ /usr/local/java-6-sun/bin/jps 

5471 NameNode

6010 JobTracker

6316 Jps

5710 DataNode

5927 SecondaryNameNode

6239 TaskTracker


다음 글 부터는 하둡을 사용한 예제 코드들을 올릴 예정.


블로그 이미지

rekun,ekun 커뉴

이 세상에서 꿈 이상으로 확실한 것을, 인간은 가지고 있는 것일까?

리눅스를 사용한 가상 서버 운영시 ipvsadm을 사용하면 됩니다.
예를 들면, 하나의 IP를 가지고 여러군데로 그 트래픽을 분산시키기 위한 Load Balancer를 제작하려고 한다면,
ipvsadm이 필수 적일것입니다.

아래 man page를 참고하셔서, 하나의 IP로 여러대의 리눅스 머신으로 그 트래픽을 분산시키는데 사용하세요.

IPVSADM(8)                Linux Administrator's Guide               IPVSADM(8)


NAME
       ipvsadm - Linux Virtual Server administration

SYNOPSIS
       ipvsadm -A|E -t|u|f service-address [-s scheduler]
               [-p [timeout]] [-M netmask]
       ipvsadm -D -t|u|f service-address
       ipvsadm -C
       ipvsadm -R
       ipvsadm -S [-n]
       ipvsadm -a|e -t|u|f service-address -r server-address
               [-g|i|m] [-w weight] [-x upper] [-y lower]
       ipvsadm -d -t|u|f service-address -r server-address
       ipvsadm -L|l [options]
       ipvsadm -Z [-t|u|f service-address]
       ipvsadm --set tcp tcpfin udp
       ipvsadm --start-daemon state [--mcast-interface interface]
               [--syncid syncid]
       ipvsadm --stop-daemon state
       ipvsadm -h

DESCRIPTION
       Ipvsadm(8) is used to set up, maintain or inspect the virtual server table in the Linux kernel. The Linux Virtual Server
       can be used to build scalable network services based on a cluster of two or more nodes. The active node of  the  cluster
       redirects  service  requests to a collection of server hosts that will actually perform the services. Supported features
       include two protocols (TCP and UDP), three packet-forwarding methods (NAT, tunneling, and  direct  routing),  and  eight
       load  balancing  algorithms  (round  robin, weighted round robin, least-connection, weighted least-connection, locality-
       based least-connection, locality-based least-connection with replication, destination-hashing, and source-hashing).

       The command has two basic formats for execution:

       ipvsadm COMMAND [protocol] service-address
               [scheduling-method] [persistence options]

       ipvsadm command [protocol] service-address
               server-address [packet-forwarding-method]
               [weight options]

       The first format manipulates a virtual service and the algorithm for assigning service requests to real servers. Option-
       ally,  a  persistent  timeout  and network mask for the granularity of a persistent service may be specified. The second
       format manipulates a real server that is associated with an existing virtual service. When specifying a real server, the
       packet-forwarding  method and the weight of the real server, relative to other real servers for the virtual service, may
       be specified, otherwise defaults will be used.

   COMMANDS
       ipvsadm(8) recognises the commands described below. Upper-case commands maintain virtual services.  Lower-case  commands
       maintain real servers that are associated with a virtual service.

       -A, --add-service
              Add a virtual service. A service address is uniquely defined by a triplet: IP address, port number, and protocol.
              Alternatively, a virtual service may be defined by a firewall-mark.

       -E, --edit-service
              Edit a virtual service.

       -D, --delete-service
              Delete a virtual service, along with any associated real servers.

       -C, --clear
              Clear the virtual server table.

       -R, --restore
              Restore Linux Virtual Server rules from stdin. Each line read from stdin will be  treated  as  the  command  line
              options  to  a  separate  invocation of ipvsadm. Lines read from stdin can optionally begin with "ipvsadm".  This
              option is useful to avoid executing a large number or ipvsadm  commands when constructing  an  extensive  routing
              table.

       -S, --save
              Dump the Linux Virtual Server rules to stdout in a format that can be read by -R|--restore.

       -a, --add-server
              Add a real server to a virtual service.

       -e, --edit-server
              Edit a real server in a virtual service.

       -d, --delete-server
              Remove a real server from a virtual service.

       -L, -l, --list
              List  the  virtual  server table if no argument is specified. If a service-address is selected, list this service
              only. If the -c option is selected, then display the connection table. The exact output is affected by the  other
              arguments given.

       -Z, --zero
              Zero the packet, byte and rate counters in a service or all services.

       --set tcp tcpfin udp
              Change  the  timeout values used for IPVS connections. This command always takes 3 parameters,  representing  the
              timeout  values (in seconds) for TCP sessions, TCP sessions after receiving a  FIN  packet,  and   UDP   packets,
              respectively.  A timeout value 0 means that the current timeout value of the  corresponding  entry  is preserved.

       --start-daemon state
              Start the connection synchronization daemon. The state is to indicate that the daemon is  started  as  master  or
              backup.  The  connection synchronization daemon is implemented inside the Linux kernel. The master daemon running
              at the primary load balancer multicasts changes of connections periodically, and the backup daemon running at the
              backup load balancers receives multicast message and creates corresponding connections. Then, in case the primary
              load balancer fails, a backup load balancer will takeover, and it has state of almost all  connections,  so  that
              almost all established connections can continue to access the service.

       --stop-daemon
              Stop the connection synchronization daemon.

       -h, --help
              Display a description of the command syntax.

   PARAMETERS
       The commands above accept or require zero or more of the following parameters.

       -t, --tcp-service service-address
              Use  TCP  service.  The  service-address  is of the form host[:port].  Host may be one of a plain IP address or a
              hostname. Port may be either a plain port number or the service name of port. The Port may be omitted,  in  which
              case zero will be used. A Port  of zero is only valid if the service is persistent as the -p|--persistent option,
             in which case it is a wild-card port, that is connections will be accepted to any port.

       -u, --udp-service service-address
              Use UDP service. See the -t|--tcp-service for the description of  the service-address.

       -f, --fwmark-service integer
              Use a firewall-mark, an integer value greater than zero, to denote a virtual service instead of an address,  port
              and  protocol  (UDP or TCP). The marking of packets with a firewall-mark is configured using the -m|--mark option
              to iptables(8). It can be used to build a virtual service assoicated with the same real servers, covering  multi-
              ple IP address, port and protocol tripplets.

              Using  firewall-mark  virtual  services provides a convenient method of grouping together different IP addresses,
              ports and protocols into a single virtual service. This is useful for both simplifying configuration if  a  large
              number  of virtual services are required and grouping persistence across what would otherwise be multiple virtual
              services.

       -s, --scheduler scheduling-method
              scheduling-method  Algorithm for allocating TCP connections and UDP datagrams to real servers.  Scheduling  algo-
              rithms are implemented as kernel modules. Ten are shipped with the Linux Virtual Server:

              rr - Robin Robin: distributes jobs equally amongst the available real servers.

              wrr  -  Weighted  Round Robin: assigns jobs to real servers proportionally to there real servers' weight. Servers
              with higher weights receive new jobs first and get more jobs than servers with lower weights. Servers with  equal
              weights get an equal distribution of new jobs.

              lc - Least-Connection: assigns more jobs to real servers with fewer active jobs.

              wlc  -  Weighted Least-Connection: assigns more jobs to servers with fewer jobs and relative to the real servers'
              weight (Ci/Wi). This is the default.

              lblc - Locality-Based Least-Connection: assigns jobs destined for the same IP address to the same server  if  the
              server  is not overloaded and available; otherwise assign jobs to servers with fewer jobs, and keep it for future
              assignment.

              lblcr - Locality-Based Least-Connection with Replication: assigns jobs destined for the same IP  address  to  the
              least-connection node in the server set for the IP address. If all the node in the server set are over loaded, it
              picks up a node with fewer jobs in the cluster and adds it in the sever set for the target. If the server set has
              not  been modified for the specified time, the most loaded node is removed from the server set, in order to avoid
              high degree of replication.

              dh - Destination Hashing: assigns jobs to servers through looking up a statically assigned hash  table  by  their
              destination IP addresses.

              sh  - Source Hashing: assigns jobs to servers through looking up a statically assigned hash table by their source
              IP addresses.

              sed - Shortest Expected Delay: assigns an incoming job to the  server  with  the  shortest  expected  delay.  The
              expected  delay that the job will experience is (Ci + 1) / Ui if  sent to the ith server, in which Ci is the num-
              ber of jobs on the the ith server and Ui is the fixed service rate (weight) of the ith server.

              nq - Never Queue: assigns an incoming job to an idle server if there is, instead of waiting for a  fast  one;  if
              all the servers are busy, it adopts the Shortest Expected Delay policy to assign the job.

       -p, --persistent [timeout]
              Specify  that  a  virtual service is persistent. If this option is specified, multiple requests from a client are
              redirected to the same real server selected for the first request.  Optionally, the timeout  of  persistent  ses-
              sions  may  be  specified given in seconds, otherwise the default of 300 seconds will be used. This option may be
              used in conjunction with protocols such as SSL or FTP where it is important  that  clients  consistently  connect
              with the same real server.

              Note:  If  a virtual service is to handle FTP connections then persistence must be set for the virtual service if
              Direct Routing or Tunnelling is used as the forwarding mechanism. If Masquerading is used in conjunction with  an
              FTP  service than persistence is not necessary, but the ip_vs_ftp kernel module must be used.  This module may be
              manually inserted into the kernel using insmod(8).

       -M, --netmask netmask
              Specify the granularity with which clients are grouped for persistent virtual services.  The  source  address  of
              the request is masked with this netmask to direct all clients from a network to the same real server. The default
              is 255.255.255.255, that is, the persistence granularity is per client host. Less specific netmasks may  be  used
              to resolve problems with non-persistent cache clusters on the client side.

       -r, --real-server server-address
              Real server that an associated request for service may be assigned to.  The server-address is the host address of
              a real server, and may plus port. Host can be either a plain IP address or a hostname.   Port  can  be  either  a
              plain  port number or the service name of port.  In the case of the masquerading method, the host address is usu-
              ally an RFC 1918 private IP address, and the port can be different from that of the associated service. With  the
              tunneling and direct routing methods, port must be equal to that of the service address. For normal services, the
              port specified  in the service address will be used if port is not specified. For fwmark services,  port  may  be
              omitted,  in which case  the destination port on the real server will be the destination port of the request sent
              to the virtual service.

       [packet-forwarding-method]

              -g, --gatewaying  Use gatewaying (direct routing). This is the default.

              -i, --ipip  Use ipip encapsulation (tunneling).

              -m, --masquerading  Use masquerading (network access translation, or NAT).

              Note:  Regardless of the packet-forwarding mechanism specified, real servers for addresses for  which  there  are
              interfaces on the local node will be use the local forwarding method, then packets for the servers will be passed
              to upper layer on the local node. This cannot be specified by ipvsadm, rather  it  set  by  the  kernel  as  real
              servers are added or modified.

       -w, --weight weight
              Weight is an integer specifying the capacity  of a server relative to the others in the pool. The valid values of
              weight are 0 through to 65535. The default is 1. Quiescent servers are specified with a weight of zero. A  quies-
              cent server will receive no new jobs but still serve the existing jobs, for all scheduling algorithms distributed
              with the Linux Virtual Server. Setting a quiescent server may be useful if the server is overloaded or  needs  to
              be taken out of service for maintenance.

       -x, --u-threshold uthreshold
              uthreshold  is  an  integer specifying the upper connection threshold of a server. The valid values of uthreshold
              are 0 through to 65535. The default is 0, which means the upper connection threshold is not set. If uthreshold is
              set  with  other values, no new connections will be sent to the server when the number of its connections exceeds
              its upper connection threshold.

       -y, --l-threshold lthreshold
              lthreshold is an integer specifying the lower connection threshold of a server. The valid  values  of  lthreshold
              are 0 through to 65535. The default is 0, which means the lower connection threshold is not set. If lthreshold is
              set with other values, the server will receive new connections when the number of its connections drops below its
              lower  connection  threshold. If lthreshold is not set but uthreshold is set, the server will receive new connec-
              tions when the number of its connections drops below three forth of its upper connection threshold.

       --mcast-interface interface
              Specify the multicast interface that the sync master daemon sends outgoing multicasts through, or the sync backup
              daemon listens to for multicasts.

       --syncid syncid
              Specify  the  syncid  that the sync master daemon fills in the SyncID header while sending multicast messages, or
              the sync backup daemon uses to filter out multicast messages not matched with the SyncID value. The valid  values
              of syncid are 0 through to 255. The default is 0, which means no filtering at all.

       -c, --connection
              Connection output. The list command with this option will list current IPVS connections.

       --timeout
              Timeout output. The list command with this option will display the  timeout values (in seconds) for TCP sessions,
              TCP sessions after receiving a FIN packet, and UDP packets.

       --daemon
              Daemon information output. The list command with this option will display the daemon  status  and  its  multicast
              interface.

       --stats
              Output  of  statistics  information. The list command with this option will display the statistics information of
              services and their servers.

       --rate Output of rate information. The list command with this option will display the rate information (such as  connec-
              tions/second, bytes/second and packets/second) of services and their servers.

       --thresholds
              Output  of  thresholds  information.  The  list  command with this option will display the upper/lower connection
              threshold information of each server in service listing.

       --persistent-conn
              Output of persistent connection information. The list command with this option will display the  persistent  con-
              nection  counter  information of each server in service listing. The persistent connection is used to forward the
              actual connections from the same client/network to the same server.

       --sort Sort the list of virtual services and real servers. The virtual service entries are sorted in ascending order  by
              <protocol, address, port>. The real server entries are sorted in ascending order by <address, port>.

       -n, --numeric
              Numeric output.  IP addresses and port numbers will be printed in numeric format rather than as as host names and
              services respectively, which is the  default.

EXAMPLE 1 - Simple Virtual Service
       The following commands configure a Linux Director to distribute incoming requests addressed to port 80 on 207.175.44.110
       equally  to  port  80  on  five  real  servers. The forwarding method used in this example is NAT, with each of the real
       servers being masqueraded by the Linux Director.

       ipvsadm -A -t 207.175.44.110:80 -s rr
       ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.1:80 -m
       ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.2:80 -m
       ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.3:80 -m
       ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.4:80 -m
       ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.5:80 -m

       Alternatively, this could be achieved in a single ipvsadm command.

       echo "
       -A -t 207.175.44.110:80 -s rr
       -a -t 207.175.44.110:80 -r 192.168.10.1:80 -m
       -a -t 207.175.44.110:80 -r 192.168.10.2:80 -m
       -a -t 207.175.44.110:80 -r 192.168.10.3:80 -m
       -a -t 207.175.44.110:80 -r 192.168.10.4:80 -m
       -a -t 207.175.44.110:80 -r 192.168.10.5:80 -m
       " | ipvsadm -R

       As masquerading is used as the forwarding mechanism in this example, the default route of the real servers must  be  set
       to  the  linux  director, which will need to be configured to forward and masquerade packets. This can be achieved using
       the following commands:

       echo "1" > /proc/sys/net/ipv4/ip_forward

EXAMPLE 2 - Firewall-Mark Virtual Service
       The following  commands  configure  a  Linux  Director  to  distribute  incoming  requests  addressed  to  any  port  on
       207.175.44.110  or  207.175.44.111  equally to the corresponding port on five real servers. As per the previous example,
       the forwarding method used in this example is NAT, with each of the real servers being masqueraded by the  Linux  Direc-
       tor.

       ipvsadm -A -f 1  -s rr
       ipvsadm -a -f 1 -r 192.168.10.1:0 -m
     ipvsadm -a -f 1 -r 192.168.10.2:0 -m
       ipvsadm -a -f 1 -r 192.168.10.3:0 -m
       ipvsadm -a -f 1 -r 192.168.10.4:0 -m
       ipvsadm -a -f 1 -r 192.168.10.5:0 -m

       As  masquerading  is used as the forwarding mechanism in this example, the default route of the real servers must be set
       to the linux director, which will need to be configured to forward and masquerade packets. The real server  should  also
       be configured to mark incoming packets addressed to any port on 207.175.44.110 and  207.175.44.111 with firewall-mark 1.
       If FTP traffic is to be handled by this virtual service, then the ip_vs_ftp kernel module needs to be inserted into  the
       kernel.  These operations can be achieved using the following commands:

       echo "1" > /proc/sys/net/ipv4/ip_forward
       modprobe ip_tables
       iptables  -A PREROUTING -t mangle -d 207.175.44.110/31 -j MARK --set-mark 1
       modprobe ip_vs_ftp

NOTES
       The  Linux Virtual Server implements three defense strategies against some types of denial of service (DoS) attacks. The
       Linux Director creates an entry for each connection in order to keep its state, and each entry occupies 128 bytes effec-
       tive  memory. LVS's vulnerability to a DoS attack lies in the potential to increase the number entries as much as possi-
       ble until the linux director runs out of memory. The three defense strategies against the attack are: Randomly drop some
       entries  in  the  table. Drop 1/rate packets before forwarding them. And use secure tcp state transition table and short
       timeouts. The strategies are controlled by sysctl variables and corresponding entries in the /proc filesystem:

       /proc/sys/net/ipv4/vs/drop_entry /proc/sys/net/ipv4/vs/drop_packet /proc/sys/net/ipv4/vs/secure_tcp

       Valid values for each variable are 0 through to 3. The default value is 0, which disables the respective defense  strat-
       egy. 1 and 2 are automatic modes - when there is no enough available memory, the respective strategy will be enabled and
       the variable is automatically set to 2, otherwise the strategy is disabled and the variable is set to 1. A  value  of  3
       denotes  that  the respective strategy is always enabled.  The available memory threshold and secure TCP timeouts can be
       tuned using the sysctl variables and corresponding entries in the /proc filesystem:

       /proc/sys/net/ipv4/vs/amemthresh /proc/sys/net/ipv4/vs/timeout_*

FILES
       /proc/net/ip_vs
       /proc/net/ip_vs_app
       /proc/net/ip_vs_conn
       /proc/net/ip_vs_stats
       /proc/sys/net/ipv4/vs/am_droprate
       /proc/sys/net/ipv4/vs/amemthresh
       /proc/sys/net/ipv4/vs/drop_entry
       /proc/sys/net/ipv4/vs/drop_packet
       /proc/sys/net/ipv4/vs/secure_tcp
       /proc/sys/net/ipv4/vs/timeout_close
       /proc/sys/net/ipv4/vs/timeout_closewait
       /proc/sys/net/ipv4/vs/timeout_established
       /proc/sys/net/ipv4/vs/timeout_finwait
       /proc/sys/net/ipv4/vs/timeout_icmp
       /proc/sys/net/ipv4/vs/timeout_lastack
       /proc/sys/net/ipv4/vs/timeout_listen
       /proc/sys/net/ipv4/vs/timeout_synack
       /proc/sys/net/ipv4/vs/timeout_synrecv
       /proc/sys/net/ipv4/vs/timeout_synsent
       /proc/sys/net/ipv4/vs/timeout_timewait
       /proc/sys/net/ipv4/vs/timeout_udp

SEE ALSO
       The LVS web site (http://www.linuxvirtualserver.org/) for more documentation about LVS.

       ipvsadm-save(8), ipvsadm-restore(8), iptables(8),
       insmod(8), modprobe(8)

AUTHORS
       ipvsadm - Wensong Zhang <wensong@linuxvirtualserver.org>
              Peter Kese <peter.kese@ijs.si>
       man page - Mike Wangsmo <wanger@redhat.com>
               Wensong Zhang <wensong@linuxvirtualserver.org>
               Horms <horms@verge.net.au>



4th Berkeley Distribution        5th July 2003                      IPVSADM(8)

 

블로그 이미지

rekun,ekun 커뉴

이 세상에서 꿈 이상으로 확실한 것을, 인간은 가지고 있는 것일까?

ipvsadm을 사용하면, 하나의 IP로 들어오는 네트워크 트래픽을 여러대의 Machine으로 분산시킬수 있다.
트래픽을 분산시키는 기능을 하는 Machine을 Load Balancer라고 하며, 그 설정 방법은 다음과 같다.
Load Balancer는 실행되는 프로세스가 그다지 고사양을 요구하지 않으므로, 저렴한 중고PC를 사용해서 만들 수 있다.

1. NAT의 설정예제(Network Address Translation)
로드밸런서가 패킷 매스커레이딩을 처리하도록 하기 위해 다음과 같이 설정한다.
(내부에서 처리후에 다시 나가는(out) 패킷의 주소를 손질한다.)

echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE  -->유동 ip
iptables -t nat -A POSTROUTING -o eth0 -j SNAT [eth0 ip]  -->고정 ip

웹 서버를 설정하는 경우, 가상 서버의 80 포트에 대해 스케줄링 알고리즘을 설정한다.

ipvsadm -A -t [가상서버의ip=로드밸런서ip]:80 -s wlc

wlc는 가중 최소 연결 방식을 나타낸다. ipvsadm에는  rr(round robin), wrr(weighted round robin), lc(least-connection), wlc(weighted least-connection) 와 같은 여러가지 스케줄링 알고리즘을 선택할 수 있다.

로드밸런서에 실제 서버들을 다음과 같이 추가한다.

ipvsadm -a -t [가상서버의ip=로드밸런서ip]:80 -R 192.168.0.11:80 -m 
ipvsadm -a -t [가상서버의ip=로드밸런서ip]:80 -R 192.168.0.12:80 -m -w 2 --> 가중치가 2


예제) Round Robin 방식을 사용하는 5개의 machine 설정

ipvsadm -A -t 207.175.44.110:80 -s rr

ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.1 -m
ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.2 -m
ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.3 -m
ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.4 -m
ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.5 -m

2. 직접 라우팅 설정예제
설정 환경 : 
시스템  네트워크 : 172.26.20.xxx
로드밸런서  주소 : 172.26.20.111
실제 서버의 주소 : 172.26.20.112, 172.26.20.113, 172.26.20.114
가상 서버의 주소 : 172.26.20.110
외부에서는 가상서버 주소로 접속할 수 있다.

로드밸런서의 설정

ifconfig eth0 172.26.20.111 netmask 255.255.255.0 broadcast 172.26.20.255 up
route add -net 172.26.20.0 netmask 255.255.255.0 dev eth0
ifconfig eth0:0 172.26.20.110 netmask 255.255.255.255 broadcast 172.26.20.110 up
route add -host 172.26.20.110 dev eth0:0

echo 1 > /proc/sys/net/ipv4/ip_forward

ipvsadm -A -t 172.26.20.110:23 -s wlc
ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 -g
ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.113 -g
ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.114 -g

실제 서버 1번에서 다음과 같이 설정한다. 다른 실제 서버의 경우 112를 113, 114 등으로 변경해야 한다.

ifconfig eth0 172.26.20.112 netmask 255.255.255.0 broadcast 172.26.20.255 up
route add -net 172.26.20.0 netmask 255.255.255.0 dev eth0
ifconfig lo:0 172.26.20.110 netmask 255.255.255.255 broadcast 172.26.20.110 up
route add -host 172.26.20.110 dev lo:0

echo 1 > /proc/sys/net/ipv4/ip_forward
ipvsadm -A -t 172.26.20.110:23 -s wlc
ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 -g

그리고 실제 서버에서 다음과 같이 설정한다.

echo 1 > /proc/sys/net/ipv4/ip_forward
ifconfig lo:0 172.26.20.110 netmask 255.255.255.255 broadcast 172.26.20.110 up
route add -host 172.26.20.110 dev lo:0
echo 1 > /proc/sys/net/ipv4/conf/all/hidden
echo 1 > /proc/sys/net/ipv4/conf/lo/hidden

참고 : NAT를 사용하는 경우에는 테스트가 된 상태이며, 직접라우팅의 경우에는 테스트가 되지 않은 상태임.

 

블로그 이미지

rekun,ekun 커뉴

이 세상에서 꿈 이상으로 확실한 것을, 인간은 가지고 있는 것일까?

인터넷 공유기는 대부분 linux를 OS로 사용한다.
조그마한 플래쉬메모리에 리눅스 커널을 최대한 컴팩트하게 빌드해서 제품으로 팔고 있다.
즉, 리눅스의 특정 기능을 사용하면 인터넷 공유기와 같은 기능을 할 수 있다는 것이다.

그 기능이 바로 iptables라는 프로그램을 사용하는 것이다.

iptables을 Linux에서 사용하는 방법은 다음과 같이 하면 된다.
예제) 
IP공유용 리눅스 머신에서 다음과 같이 설정.
외부에서 사용하는 IP : [공인IP]
내부에서 사용하는 IP : 192.168.0.X
외부에서 접근하는 port : 80,21,22
외부인터넷 랜카드 : eth0

echo "1" > /proc/sys/net/ipv4/ip_forward
/sbin/iptables -P INPUT ACCEPT
/sbin/iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -o eth0 -j SNAT --to [공인IP]
/sbin/iptables -t nat -A PREROUTING -p tcp -d [공인IP] --dport 80 -j DNAT --to 192.168.0.3:80
/sbin/iptables -t nat -A PREROUTING -p tcp -d [공인IP] --dport 21 -j DNAT --to 192.168.0.4:21
/sbin/iptables -t nat -A PREROUTING -p tcp -d [공인IP] --dport 22 -j DNAT --to 192.168.0.3:22
/sbin/iptables -A INPUT -i eth0 -j ACCEPT
/sbin/iptables -A OUTPUT -o eth0 -j ACCEPT


iptables는 커널 2.3.15이상 부터 지원된다. 현재는 커널 2.6이 거의 주류를 있으므로, 그냥 위 명령을 실행하면 리눅스 머신이 IP를 공유할 수 있게 해줄것이다.

자세한 iptables의 설명은 man iptables하면 확인할 수 있다.

 

블로그 이미지

rekun,ekun 커뉴

이 세상에서 꿈 이상으로 확실한 것을, 인간은 가지고 있는 것일까?