website/reference/basic/subcollectives.md

   1 ---
   2 layout: default
   3 title: Subcollectives
   4 ---
   5
   6 [ActiveMQClustering]: /mcollective/reference/integration/activemq_clusters.html
   7 [activemq_authorization]: /mcollective/deploy/middleware/activemq.html#authorization-group-permissions
   8 [activemq_detailed]: /mcollective/deploy/middleware/activemq.html#detailed-restrictions
   9 [activemq_subcollectives]: /mcollective/deploy/middleware/activemq.html#detailed-restrictions-with-multiple-subcollectives
  10 [activemq_filtering]: /mcollective/deploy/middleware/activemq.html#destination-filtering
  11 [activemq_authentication]: /mcollective/deploy/middleware/activemq.html#authentication-users-and-groups
  12
  13 ## Overview
  14
  15 By default all servers are part of a single broadcast domain, if you have an
  16 agent on all machines in your network and you send a message directed to
  17 machines with that agent they will all get it regardless of filters.
  18
  19 This works well for the common use case but can present problems in the
  20 following scenarios:
  21
  22  * You have a very big and busy network.  With thousands of machines responding
  23    to requests every 10 seconds very large numbers of messages will be created
  24    and you might want to partition the traffic.
  25  * You have multiple data centers on different continents, the latency and
  26    volume of traffic will be too big.  You'd want to duplicate monitoring and
  27    management automations in each datacenter but as it's all one broadcast
  28    domain you still see large amount of traffic on the slow link.
  29  * You can't just run multiple seperate installs because you still wish to
  30    retain the central management feature of MCollective.
  31  * Securing a flat network can be problematic.  SimpleRPC has a security
  32    framework that is aware of users and network topology but the core network
  33    doesnt.
  34
  35 We've introduced the concept of sub collectives that lets you define broadcast
  36 domains and configure a mcollective server to belong to one or many of these domains.
  37
  38 ## Partitioning Approaches
  39
  40 Determining how to partition your nework can be a very complex subject and
  41 requires an understanding of your message flow, where requestors sit and also
  42 the topology of your middleware clusters.
  43
  44 Most middleware solutions will only send traffic where they know there exist an
  45 interest in this traffic.  Therefore if you had an agent on only 10 of 1000
  46 machines only those 10 machines will receive the associated traffic.  This is an
  47 important distinction to keep in mind.
  48
  49 ![ActiveMQ Cluster](../../images/subcollectives-multiple-middleware.png)
  50
  51 We'll be working with a small 52 node collective that you can see above, the
  52 collective has machines in many data centers spread over 4 countries.  There are
  53 3 ActiveMQ servers connected in a mesh.
  54
  55 Along with each ActiveMQ node is also a Puppet Master, Nagios instance and other
  56 shared infrastructure components.
  57
  58 An ideal setup for this network would be:
  59
  60  * MCollective NRPE and Puppetd Agent on each of 52 servers
  61  * Puppet Commander on each of the 3 ActiveMQ locations
  62  * Nagios in each of the locations monitoring the machines in its region
  63  * Regional traffic will be isolated and contained to the region as much as
  64    possible
  65  * Systems Administrators and Registration data retain the ability to target the
  66    whole collective
  67
  68 The problem with a single flat collective is that each of the 3 Puppet
  69 Commanders will get a copy of all the traffic, even traffic they did not request
  70 they will simply ignore the wrong traffic.  The links between Europe and US will
  71 see a large number of messages traveling over them.  In a small 52 node traffic
  72 this is managable but if each of the 4 locations had thousands of nodes the
  73 situation will rapidly become untenable.
  74
  75 It seems natural then to create a number of broadcast domains - subcollectives:
  76
  77  * A global collective that each machines belongs to
  78  * UK, DE, ZA and US collectives that contains just machines in those regions
  79  * An EU collective that has UK, DE and ZA machines
  80
  81 Visually this arrangement might look like the diagram below:
  82
  83 ![Subcollectives](../../images/subcollectives-collectives.png)
  84
  85 Notice how subcollectives can span broker boundaries - our EU collective has nodes
  86 that would connect to both the UK and DE brokers.
  87
  88 We can now configure our Nagios and Puppet Commanders to communicate only to the
  89 sub collectives and the traffic for these collectives will be contained
  90 regionally.
  91
  92 The graph below shows the impact of doing this, this is the US ActiveMQ instance
  93 showing traffic before partitioning and after.  You can see even on a small
  94 network this can have a big impact.
  95
  96 ![Subcollectives](../../images/subcollectives-impact.png)
  97
  98 ## Configuring MCollective
  99
 100 Configuring the partitioned collective above is fairly simple.  We'll look at
 101 one of the DE nodes for reference:
 102
 103 {% highlight ini %}
 104 collectives = mcollective,de_collective,eu_collective
 105 main_collective = mcollective
 106 {% endhighlight %}
 107
 108 The _collectives_ directive tells the node all the collectives it should belong
 109 to and the _main`_`collective_ instructs Registration where to direct messages
 110 to.
 111
 112 ## Testing
 113
 114 Testing that it works is pretty simple, first we need a _client.cfg_ that
 115 configures your client to talk to all the sub collectives:
 116
 117 {% highlight ini %}
 118 collectives = mcollective,uk_collective,us_collective,de_collective,eu_collective,us_collective,za_collective
 119 main_collective = mcollective
 120 {% endhighlight %}
 121
 122 You can now test with _mco ping_:
 123
 124 {% highlight console %}
 125 $ mco ping -T us_collective
 126 host1.us.my.net         time=200.67 ms
 127 host2.us.my.net         time=241.30 ms
 128 host3.us.my.net         time=245.24 ms
 129 host4.us.my.net         time=275.42 ms
 130 host5.us.my.net         time=279.90 ms
 131 host6.us.my.net         time=283.61 ms
 132 host7.us.my.net         time=370.56 ms
 133
 134
 135 ---- ping statistics ----
 136 7 replies max: 370.56 min: 200.67 avg: 270.96
 137 {% endhighlight %}
 138
 139 By specifying other collectives in the -T argument you should see the sub
 140 collectives and if you do not specify anything you should see all machines.
 141
 142 Clients don't need to know about all collectives, only the ones they intend
 143 to communicate with.
 144
 145 You can discover the list of known collectives and how many nodes are in each
 146 using the _inventory_ application:
 147
 148 {% highlight console %}
 149 $ mco inventory --list-collectives
 150
 151  * [ ==================================== ] 52 / 52
 152
 153    Collective                     Nodes
 154    ==========                     =====
 155    za_collective                  2
 156    us_collective                  7
 157    uk_collective                  19
 158    de_collective                  24
 159    eu_collective                  45
 160    mcollective                    52
 161
 162                      Total nodes: 52
 163
 164 {% endhighlight %}
 165
 166 ## Partitioning for Security
 167
 168 Another possible advantage from subcollectives is security.  While the SimpleRPC
 169 framework has a security model that is aware of the topology the core network
 170 layer does not.  Even if you only give someone access to run SimpleRPC requests
 171 against some machines they can still use _mco ping_ to discover other nodes on
 172 your network.
 173
 174 By creating a subcollective of just their nodes and restricting them on the
 175 middleware level to just that collective you can effectively and completely
 176 create a secure isolated zone that overlays your exiting network.
 177
 178 These restrictions have to be configured on the middleware server, outside of MCollective itself. The method will vary based on the middleware you use; the suggestions below are for ActiveMQ, the main recommended middleware.
 179
 180 ### Identifying Subcollectives on ActiveMQ
 181
 182 The ActiveMQ connector plugin identifies subcollectives with the **first segment** of every destination (topic or queue) name.
 183
 184 So for direct node addressing, for example, the default `mcollective` collective would use the `mcollective.nodes` queue, and `uk_collective` would use the `uk_collective.nodes` queue. For the package agent, they would use the `mcollective.package.agent` and `uk_collective.package.agent` topics, respectively.
 185
 186 This makes it easy to use ActiveMQ destination wildcards to control access to a given collective.
 187
 188 ### Per-Subcollective Authorization
 189
 190 To control subcollective access, identify the set of topics and queues that collective will use, then use ActiveMQ's authorization rules to secure them.
 191
 192 * [See the "Authorization" section of the ActiveMQ config reference][activemq_authorization] for details.
 193 * The ["Detailed Restrictions"][activemq_detailed] example shows all of the topics and queues used by the default collective; you can copy/paste/modify these for a small number of collectives.
 194 * The ["Detailed Restrictions with Multiple Subcollectives"][activemq_subcollectives] example uses a snippet of ERB template to manage any number of collectives.
 195
 196 You must then [configure multiple ActiveMQ user accounts][activemq_authentication] for your site's admins. Give each user membership in the groups they'll need to manage their collectives.
 197
 198 ### Destination Filters in a Network of Brokers
 199
 200 In a [network of brokers][ActiveMQClustering], you can also prevent propagation across the network of sub collective
 201 traffic. See the ["Destination Filtering"][activemq_filtering] section of the ActiveMQ config reference for details.