padding-spec.txt 14 KB

  1. Tor Padding Specification
  2. Mike Perry
  3. Note: This is an attempt to specify Tor as currently implemented. Future
  4. versions of Tor will implement improved algorithms.
  5. This document tries to cover how Tor chooses to use cover traffic to obscure
  6. various traffic patterns from external and internal observers. Other
  7. implementations MAY take other approaches, but implementors should be aware of
  8. the anonymity and load-balancing implications of their choices.
  10. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
  12. "OPTIONAL" in this document are to be interpreted as described in
  13. RFC 2119.
  14. 1. Overview
  15. Tor supports two classes of cover traffic: connection-level padding, and
  16. circuit-level padding.
  17. Connection-level padding uses the CELL_PADDING cell command for cover
  18. traffic, where as circuit-level padding uses the RELAY_COMMAND_DROP relay
  19. command. CELL_PADDING is single-hop only and can be differentiated from
  20. normal traffic by Tor relays ("internal" observers), but not by entities
  21. monitoring Tor OR connections ("external" observers).
  22. RELAY_COMMAND_DROP is multi-hop, and is not visible to intermediate Tor
  23. relays, because the relay command field is covered by circuit layer
  24. encryption. Moreover, Tor's 'recognized' field allows RELAY_COMMAND_DROP
  25. padding to be sent to any intermediate node in a circuit (as per Section
  26. 6.1 of tor-spec.txt).
  27. Currently, only single-hop CELL_PADDING is used by Tor. It is described in
  28. Section 2. At a later date, further sections will be added to this document
  29. to describe various uses of multi-hop circuit-level padding.
  30. 2. Connection-level padding
  31. 2.1. Background
  32. Tor clients and relays make use of CELL_PADDING to reduce the resolution of
  33. connection-level metadata retention by ISPs and surveillance infrastructure.
  34. Such metadata retention is implemented by Internet routers in the form of
  35. Netflow, jFlow, Netstream, or IPFIX records. These records are emitted by
  36. gateway routers in a raw form and then exported (often over plaintext) to a
  37. "collector" that either records them verbatim, or reduces their granularity
  38. further[1].
  39. Netflow records and the associated data collection and retention tools are
  40. very configurable, and have many modes of operation, especially when
  41. configured to handle high throughput. However, at ISP scale, per-flow records
  42. are very likely to be employed, since they are the default, and also provide
  43. very high resolution in terms of endpoint activity, second only to full packet
  44. and/or header capture.
  45. Per-flow records record the endpoint connection 5-tuple, as well as the
  46. total number of bytes sent and received by that 5-tuple during a particular
  47. time period. They can store additional fields as well, but it is primarily
  48. timing and bytecount information that concern us.
  49. When configured to provide per-flow data, routers emit these raw flow
  50. records periodically for all active connections passing through them
  51. based on two parameters: the "active flow timeout" and the "inactive
  52. flow timeout".
  53. The "active flow timeout" causes the router to emit a new record
  54. periodically for every active TCP session that continuously sends data. The
  55. default active flow timeout for most routers is 30 minutes, meaning that a
  56. new record is created for every TCP session at least every 30 minutes, no
  57. matter what. This value can be configured from 1 minute to 60 minutes on
  58. major routers.
  59. The "inactive flow timeout" is used by routers to create a new record if a
  60. TCP session is inactive for some number of seconds. It allows routers to
  61. avoid the need to track a large number of idle connections in memory, and
  62. instead emit a separate record only when there is activity. This value
  63. ranges from 10 seconds to 600 seconds on common routers. It appears as
  64. though no routers support a value lower than 10 seconds.
  65. For reference, here are default values and ranges (in parenthesis when
  66. known) for common routers, along with citations to their manuals.
  67. Some routers speak other collection protocols than Netflow, and in the
  68. case of Juniper, use different timeouts for these protocols. Where this
  69. is known to happen, it has been noted.
  70. Inactive Timeout Active Timeout
  71. Cisco IOS[3] 15s (10-600s) 30min (1-60min)
  72. Cisco Catalyst[4] 5min 32min
  73. Juniper (jFlow)[5] 15s (10-600s) 30min (1-60min)
  74. Juniper (Netflow)[6,7] 60s (10-600s) 30min (1-30min)
  75. H3C (Netstream)[8] 60s (60-600s) 30min (1-60min)
  76. Fortinet[9] 15s 30min
  77. MicroTik[10] 15s 30min
  78. nProbe[14] 30s 120s
  79. Alcatel-Lucent[2] 15s (10-600s) 30min (1-600min)
  80. The combination of the active and inactive netflow record timeouts allow us
  81. to devise a low-cost padding defense that causes what would otherwise be
  82. split records to "collapse" at the router even before they are exported to
  83. the collector for storage. So long as a connection transmits data before the
  84. "inactive flow timeout" expires, then the router will continue to count the
  85. total bytes on that flow before finally emitting a record at the "active
  86. flow timeout".
  87. This means that for a minimal amount of padding that prevents the "inactive
  88. flow timeout" from expiring, it is possible to reduce the resolution of raw
  89. per-flow netflow data to the total amount of bytes send and received in a 30
  90. minute window. This is a vast reduction in resolution for HTTP, IRC, XMPP,
  91. SSH, and other intermittent interactive traffic, especially when all
  92. user traffic in that time period is multiplexed over a single connection
  93. (as it is with Tor).
  94. 2.2. Implementation
  95. Tor clients currently maintain one TLS connection to their Guard node to
  96. carry actual application traffic, and make up to 3 additional connections to
  97. other nodes to retrieve directory information.
  98. We pad only the client's connection to the Guard node, and not any other
  99. connection. We treat Bridge node connections to the Tor network as client
  100. connections, and pad them, but otherwise not pad between normal relays.
  101. Both clients and Guards will maintain a timer for all application (ie:
  102. non-directory) TLS connections. Every time a non-padding packet is sent or
  103. received by either end, that endpoint will sample a timeout value from
  104. between 1.5 seconds and 9.5 seconds using the max(X,X) distribution
  105. described in Section 2.3. The time range is subject to consensus
  106. parameters as specified in Section 2.6.
  107. If the connection becomes active for any reason before this timer
  108. expires, the timer is reset to a new random value between 1.5 and 9.5
  109. seconds. If the connection remains inactive until the timer expires, a
  110. single CELL_PADDING cell will be sent on that connection.
  111. In this way, the connection will only be padded in the event that it is
  112. idle, and will always transmit a packet before the minimum 10 second inactive
  113. timeout.
  114. 2.3. Padding Cell Timeout Distribution Statistics
  115. It turns out that because the padding is bidirectional, and because both
  116. endpoints are maintaining timers, this creates the situation where the time
  117. before sending a padding packet in either direction is actually
  118. min(client_timeout, server_timeout).
  119. If client_timeout and server_timeout are uniformly sampled, then the
  120. distribution of min(client_timeout,server_timeout) is no longer uniform, and
  121. the resulting average timeout (Exp[min(X,X)]) is much lower than the
  122. midpoint of the timeout range.
  123. To compensate for this, instead of sampling each endpoint timeout uniformly,
  124. we instead sample it from max(X,X), where X is uniformly distributed.
  125. If X is a random variable uniform from 0..R-1 (where R=high-low), then the
  126. random variable Y = max(X,X) has Prob(Y == i) = (2.0*i + 1)/(R*R).
  127. Then, when both sides apply timeouts sampled from Y, the resulting
  128. bidirectional padding packet rate is now a third random variable:
  129. Z = min(Y,Y).
  130. The distribution of Z is slightly bell-shaped, but mostly flat around the
  131. mean. It also turns out that Exp[Z] ~= Exp[X]. Here's a table of average
  132. values for each random variable:
  133. R Exp[X] Exp[Z] Exp[min(X,X)] Exp[Y=max(X,X)]
  134. 2000 999.5 1066 666.2 1332.8
  135. 3000 1499.5 1599.5 999.5 1999.5
  136. 5000 2499.5 2666 1666.2 3332.8
  137. 6000 2999.5 3199.5 1999.5 3999.5
  138. 7000 3499.5 3732.8 2332.8 4666.2
  139. 8000 3999.5 4266.2 2666.2 5332.8
  140. 10000 4999.5 5328 3332.8 6666.2
  141. 15000 7499.5 7995 4999.5 9999.5
  142. 20000 9900.5 10661 6666.2 13332.8
  143. In this way, we maintain the property that the midpoint of the timeout range
  144. is the expected mean time before a padding packet is sent in either
  145. direction.
  146. 2.4. Maximum overhead bounds
  147. With the default parameters and the above distribution, we expect a
  148. padded connection to send one padding cell every 5.5 seconds. This
  149. averages to 103 bytes per second full duplex (~52 bytes/sec in each
  150. direction), assuming a 512 byte cell and 55 bytes of TLS+TCP+IP headers.
  151. For a client connection that remains otherwise idle for its expected
  152. ~50 minute lifespan (governed by the circuit available timeout plus a
  153. small additional connection timeout), this is about 154.5KB of overhead
  154. in each direction (309KB total).
  155. With 2.5M completely idle clients connected simultaneously, 52 bytes per
  156. second amounts to 130MB/second in each direction network-wide, which is
  157. roughly the current amount of Tor directory traffic[11]. Of course, our
  158. 2.5M daily users will neither be connected simultaneously, nor entirely
  159. idle, so we expect the actual overhead to be much lower than this.
  160. 2.5. Reducing or Disabling Padding via Negotiation
  161. To allow mobile clients to either disable or reduce their padding overhead,
  162. the CELL_PADDING_NEGOTIATE cell (tor-spec.txt section 7.2) may be sent from
  163. clients to relays. This cell is used to instruct relays to cease sending
  164. padding.
  165. If the client has opted to use reduced padding, it continues to send
  166. padding cells sampled from the range [9000,14000] milliseconds (subject to
  167. consensus parameter alteration as per Section 2.6), still using the
  168. Y=max(X,X) distribution. Since the padding is now unidirectional, the
  169. expected frequency of padding cells is now governed by the Y distribution
  170. above as opposed to Z. For a range of 5000ms, we can see that we expect to
  171. send a padding packet every 9000+3332.8 = 12332.8ms. We also half the
  172. circuit available timeout from ~50min down to ~25min, which causes the
  173. client's OR connections to be closed shortly there after when it is idle,
  174. thus reducing overhead.
  175. These two changes cause the padding overhead to go from 309KB per one-time-use
  176. Tor connection down to 69KB per one-time-use Tor connection. For continual
  177. usage, the maximum overhead goes from 103 bytes/sec down to 46 bytes/sec.
  178. If a client opts to completely disable padding, it sends a
  179. CELL_PADDING_NEGOTIATE to instruct the relay not to pad, and then does not
  180. send any further padding itself.
  181. 2.6. Consensus Parameters Governing Behavior
  182. Connection-level padding is controlled by the following consensus parameters:
  183. * nf_ito_low
  184. - The low end of the range to send padding when inactive, in ms.
  185. - Default: 1500
  186. * nf_ito_high
  187. - The high end of the range to send padding, in ms.
  188. - Default: 9500
  189. - If nf_ito_low == nf_ito_high == 0, padding will be disabled.
  190. * nf_ito_low_reduced
  191. - For reduced padding clients: the low end of the range to send padding
  192. when inactive, in ms.
  193. - Default: 9000
  194. * nf_ito_high_reduced
  195. - For reduced padding clients: the high end of the range to send padding,
  196. in ms.
  197. - Default: 14000
  198. * nf_conntimeout_clients
  199. - The number of seconds to keep circuits opened and available for
  200. clients to use. Note that the actual client timeout is randomized
  201. uniformly from this value to twice this value. This governs client
  202. OR conn lifespan. Reduced padding clients use half the consensus
  203. value.
  204. - Default: 1800
  205. * nf_pad_before_usage
  206. - If set to 1, OR connections are padded before the client uses them
  207. for any application traffic. If 0, OR connections are not padded
  208. until application data begins.
  209. - Default: 1
  210. * nf_pad_relays
  211. - If set to 1, we also pad inactive relay-to-relay connections
  212. - Default: 0
  213. * nf_conntimeout_relays
  214. - The number of seconds that idle relay-to-relay connections are kept
  215. open.
  216. - Default: 3600
  217. 1.
  218. 2.
  219. 3.
  220. 4.
  221. 5.
  222. 6.
  223. 7.
  224. 8.
  225. 9.
  226. 10.
  227. 11.
  228. 12.
  229. 13.
  230. 14.