Browse Source

Start a new guard-spec.txt as a copy of prop271.

Remove the old guard section of path-spec, now that guard-spec is
separate.
Nick Mathewson 3 years ago
parent
commit
6fddd69ec0
2 changed files with 824 additions and 117 deletions
  1. 818 0
      guard-spec.txt
  2. 6 117
      path-spec.txt

+ 818 - 0
guard-spec.txt

@@ -0,0 +1,818 @@
+                      Tor Guard Specification
+
+                           Isis Lovecruft
+                         George Kadianakis
+                              Ola Bini
+                           Nick Mathewson
+
+1. Introduction and motivation
+
+  Tor uses entry guards to prevent an attacker who controls some
+  fraction of the network from observing a fraction of every user's
+  traffic. If users chose their entries and exits uniformly at
+  random from the list of servers every time they build a circuit,
+  then an adversary who had (k/N) of the network would deanonymize
+  F=(k/N)^2 of all circuits... and after a given user had built C
+  circuits, the attacker would see them at least once with
+  probability 1-(1-F)^C.  With large C, the attacker would get a
+  sample of every user's traffic with probability 1.
+
+  To prevent this from happening, Tor clients choose a small number
+  of guard nodes (currently 3).  These guard nodes are the only
+  nodes that the client will connect to directly.  If they are not
+  compromised, the user's paths are not compromised.
+
+  This proposal outlines Tor's current guard selection algorithm,
+  which tries to meet the following goals:
+
+    - Heuristics and algorithms for determining how and which guards
+      are chosen should be kept as simple and easy to understand as
+      possible.
+
+    - Clients in censored regions or who are behind a fascist
+      firewall who connect to the Tor network should not experience
+      any significant disadvantage in terms of reachability or
+      usability.
+
+    - Tor should make a best attempt at discovering the most
+      appropriate behaviour, with as little user input and
+      configuration as possible.
+
+    - Tor clients should discover usable guards without too much
+      delay.
+
+    - Tor clients should resist (to the extent possible) attacks
+      that try to force them onto compromised guards.
+
+
+2. State instances
+
+   In the algorithm below, we describe a set of persistent and
+   non-persistent state variables.  These variables should be
+   treated as an object, of which multiple instances can exist.
+
+   In particular, we specify the use of three particular instances:
+
+     A. UseBridges
+
+      If UseBridges is set, then we replace the {GUARDS} set in
+      [Sec:GUARDS] below with the list of list of configured
+      bridges.  We maintain a separate persistent instance of
+      {SAMPLED_GUARDS} and {CONFIRMED_GUARDS} and other derived
+      values for the UseBridges case.
+
+      In this case, we impose no upper limit on the sample size.
+
+    B. EntryNodes / ExcludeNodes / Reachable*Addresses /
+        FascistFirewall / ClientUseIPv4=0
+
+      If one of the above options is set, and UseBridges is not,
+      then we compare the fraction of usable guards in the consensus
+      to the total number of guards in the consensus.
+
+      If this fraction is less than {MEANINGFUL_RESTRICTION_FRAC},
+      we use a separate instance of the state.
+
+      (While Tor is running, we do not change back and forth between
+      the separate instance of the state and the default instance
+      unless the fraction of usable guards is 5% higher than, or 5%
+      lower than, {MEANINGFUL_RESTRICTION_FRAC}.  This prevents us
+      from flapping back and forth between instances if we happen to
+      hit {MEANINGFUL_RESTRICTION_FRAC} exactly.
+
+      If this fraction is less than {EXTREME_RESTRICTION_FRAC}, we use a
+      separate instance of the state, and warn the user.
+
+      [TODO: should we have a different instance for each set of heavily
+      restricted options?]
+
+   C. Default
+
+      If neither of the above variant-state instances is used,
+      we use a default instance.
+
+3. Circuit Creation, Entry Guard Selection (1000 foot view)
+
+   A circuit in Tor is a path through the network connecting a client to
+   its destination. At a high-level, a three-hop exit circuit will look
+   like this:
+
+   Client <-> Entry Guard <-> Middle Node <-> Exit Node <-> Destination
+
+   Entry guards are the only nodes which a client will connect to
+   directly, Exit relays are the nodes by which traffic exists the
+   Tor network in order to connect to an external destination.
+
+   3.1 Path selection
+
+   For any circuit, at least one entry guard and middle node(s) are
+   required. An exit node is required if traffic will exit the Tor
+   network. Depending on its configuration, a relay listed in a
+   consensus could be used for any of these roles. However, this
+   proposal defines how entry guards specifically should be selected and
+   managed, as opposed to middle or exit nodes.
+
+   3.1.1 Entry guard selection
+
+   At a high level, a relay listed in a consensus will move through the
+   following states in the process from initial selection to eventual
+   usage as an entry guard:
+
+      relays listed in consensus
+                 |
+               sampled
+               |     |
+         confirmed   filtered
+               |     |      |
+               primary      usable_filtered
+
+   Relays listed in the latest consensus can be sampled for guard usage
+   if they have the "Guard" flag. Sampling is random but weighted by
+   bandwidth.
+
+   Once a path is built and a circuit established using this guard, it
+   is marked as confirmed. Until this point, guards are first sampled
+   and then filtered based on information such as our current
+   configuration (see SAMPLED and FILTERED sections) and later marked as
+   usable_filtered if the guard is not primary but can be reached.
+
+   It is always preferable to use a primary guard when building a new
+   circuit in order to reduce guard churn; only on failure to connect to
+   existing primary guards will new guards be used.
+
+   3.1.2 Middle and exit node selection
+
+   Middle nodes are selected at random from relays listed in the
+   latest consensus, weighted by bandwidth. Exit nodes are chosen
+   similarly but restricted to relays with an exit policy.
+
+   3.2 Circuit Building
+
+   Once a path is chosen, Tor will use this path to build a new circuit.
+
+   If the circuit is built successfully, it either can be used
+   immediately or wait for a better guard, depending on whether other
+   circuits already exist with higher-priority guards.
+
+   If at any point the circuit fails, the guard is marked as
+   unreachable, the circuit is closed, and waiting circuits are updated.
+
+4. The algorithm.
+
+4.0.  The guards listed in the current consensus. [Section:GUARDS]
+
+   By {set:GUARDS} we mean the set of all guards in the current
+   consensus that are usable for all circuits and directory
+   requests. (They must have the flags: Stable, Fast, V2Dir, Guard.)
+
+      **Rationale**
+
+   We require all guards to have the flags that we potentially need
+   from any guard, so that all guards are usable for all circuits.
+
+4.1.  The Sampled Guard Set. [Section:SAMPLED]
+
+   We maintain a set, {set:SAMPLED_GUARDS}, that persists across
+   invocations of Tor. It is an unordered subset of the nodes that
+   we have seen listed as a guard in the consensus at some point.
+   For each such guard, we record persistently:
+
+      - {pvar:ADDED_ON_DATE}: The date on which it was added to
+        sampled_guards.
+
+        We base this value on RAND(now, {GUARD_LIFETIME}/10). See
+        Appendix [RANDOM] below.
+
+      - {pvar:ADDED_BY_VERSION}: The version of Tor that added it to
+        sampled_guards.
+
+      - {pvar:IS_LISTED}: Whether it was listed as a usable Guard in
+        the _most recent_ consensus we have seen.
+
+      - {pvar:FIRST_UNLISTED_AT}: If IS_LISTED is false, the publication date
+        of the earliest consensus in which this guard was listed such that we
+        have not seen it listed in any later consensus.  Otherwise "None."
+        We randomize this, based on
+          RAND(added_at_time, {REMOVE_UNLISTED_GUARDS_AFTER} / 5)
+
+   For each guard in {SAMPLED_GUARDS}, we also record this data,
+   non-persistently:
+
+      - {tvar:last_tried_connect}: A 'last tried to connect at'
+        time.  Default 'never'.
+
+      - {tvar:is_reachable}: an "is reachable" tristate, with
+        possible values { <state:yes>, <state:no>, <state:maybe> }.
+        Default '<maybe>.'
+
+               [Note: "yes" is not strictly necessary, but I'm
+                making it distinct from "maybe" anyway, to make our
+                logic clearer.  A guard is "maybe" reachable if it's
+                worth trying. A guard is "yes" reachable if we tried
+                it and succeeded.]
+
+      - {tvar:failing_since}: The first time when we failed to
+        connect to this guard. Defaults to "never".  Reset to
+        "never" when we successfully connect to this guard.
+
+      - {tvar:is_pending} A "pending" flag.  This indicates that we
+        are trying to build an exploratory circuit through the
+        guard, and we don't know whether it will succeed.
+
+   We require that {SAMPLED_GUARDS} contain at least
+   {MIN_FILTERED_SAMPLE} guards from the consensus (if possible),
+   but not more than {MAX_SAMPLE_THRESHOLD} of the number of guards
+   in the consensus, and not more then {MAX_SAMPLE_SIZE} in total.
+   (But if the maximum would be smaller than {MIN_FILTERED_SAMPLE}, we
+   set the maximum at {MIN_FILTERED_SAMPLE}.)
+
+   To add a new guard to {SAMPLED_GUARDS}, pick an entry at random
+   from ({GUARDS} - {SAMPLED_GUARDS}), weighted by bandwidth.
+
+   We remove an entry from {SAMPLED_GUARDS} if:
+
+      * We have a live consensus, and {IS_LISTED} is false, and
+        {FIRST_UNLISTED_AT} is over {REMOVE_UNLISTED_GUARDS_AFTER}
+        days in the past.
+
+     OR
+
+      * We have a live consensus, and {ADDED_ON_DATE} is over
+        {GUARD_LIFETIME} ago, *and* {CONFIRMED_ON_DATE} is either
+        "never", or over {GUARD_CONFIRMED_MIN_LIFETIME} ago.
+
+   Note that {SAMPLED_GUARDS} does not depend on our configuration.
+   It is possible that we can't actually connect to any of these
+   guards.
+
+     **Rationale**
+
+   The {SAMPLED_GUARDS} set is meant to limit the total number of
+   guards that a client will connect to in a given period.  The
+   upper limit on its size prevents us from considering too many
+   guards.
+
+   The first expiration mechanism is there so that our
+   {SAMPLED_GUARDS} list does not accumulate so many dead
+   guards that we cannot add new ones.
+
+   The second expiration mechanism makes us rotate our guards slowly
+   over time.
+
+
+4.2. The Usable Sample [Section:FILTERED]
+
+   We maintain another set, {set:FILTERED_GUARDS}, that does not
+   persist. It is derived from:
+       - {SAMPLED_GUARDS}
+       - our current configuration,
+       - the path bias information.
+
+   A guard is a member of {set:FILTERED_GUARDS} if and only if all
+   of the following are true:
+
+       - It is a member of {SAMPLED_GUARDS}, with {IS_LISTED} set to
+         true.
+       - It is not disabled because of path bias issues.
+       - It is not disabled because of ReachableAddress police,
+         the ClientUseIPv4 setting, the ClientUseIPv6 setting,
+         the FascistFirewall setting, or some other
+         option that prevents using some addresses.
+       - It is not disabled because of ExcludeNodes.
+       - It is a bridge if UseBridges is true; or it is not a
+         bridge if UseBridges is false.
+       - Is included in EntryNodes if EntryNodes is set and
+         UseBridges is not. (But see 2.B above).
+
+   We have an additional subset, {set:USABLE_FILTERED_GUARDS}, which
+   is defined to be the subset of {FILTERED_GUARDS} where
+   {is_reachable} is <yes> or <maybe>.
+
+   We try to maintain a requirement that {USABLE_FILTERED_GUARDS}
+   contain at least {MIN_FILTERED_SAMPLE} elements:
+
+     Whenever we are going to sample from {USABLE_FILTERED_GUARDS},
+     and it contains fewer than {MIN_FILTERED_SAMPLE} elements, we
+     add new elements to {SAMPLED_GUARDS} until one of the following
+     is true:
+
+       * {USABLE_FILTERED_GUARDS} is large enough,
+     OR
+       * {SAMPLED_GUARDS} is at its maximum size.
+
+
+     ** Rationale **
+
+  These filters are applied _after_ sampling: if we applied them
+  before the sampling, then our sample would reflect the set of
+  filtering restrictions that we had in the past.
+
+4.3. The confirmed-guard list. [Section:CONFIRMED]
+
+  [formerly USED_GUARDS]
+
+  We maintain a persistent ordered list, {list:CONFIRMED_GUARDS}.
+  It contains guards that we have used before, in our preference
+  order of using them.  It is a subset of {SAMPLED_GUARDS}.  For
+  each guard in this list, we store persistently:
+
+      - {pvar:IDENTITY} Its fingerprint
+
+      - {pvar:CONFIRMED_ON_DATE} When we added this guard to
+        {CONFIRMED_GUARDS}.
+
+        Randomized as RAND(now, {GUARD_LIFETIME}/10).
+
+  We add new members to {CONFIRMED_GUARDS} when we mark a circuit
+  built through a guard as "for user traffic."
+
+  Whenever we remove a member from {SAMPLED_GUARDS}, we also remove
+  it from {CONFIRMED_GUARDS}.
+
+      [Note: You can also regard the {CONFIRMED_GUARDS} list as a
+      total ordering defined over a subset of {SAMPLED_GUARDS}.]
+
+  Definition: we call Guard A "higher priority" than another Guard B
+  if, when A and B are both reachable, we would rather use A.  We
+  define priority as follows:
+
+     * Every guard in {CONFIRMED_GUARDS} has a higher priority
+       than every guard not in {CONFIRMED_GUARDS}.
+
+     * Among guards in {CONFIRMED_GUARDS}, the one appearing earlier
+       on the {CONFIRMED_GUARDS} list has a higher priority.
+
+     * Among guards that do not appear in {CONFIRMED_GUARDS},
+       {is_pending}==true guards have higher priority.
+
+     * Among those, the guard with earlier {last_tried_connect} time
+       have higher priority.
+
+     * Finally, among guards that do not appear in
+       {CONFIRMED_GUARDS} with {is_pending==false}, all have equal
+       priority.
+
+   ** Rationale **
+
+  We add elements to this ordering when we have actually used them
+  for building a usable circuit.  We could mark them at some other
+  time (such as when we attempt to connect to them, or when we
+  actually connect to them), but this approach keeps us from
+  committing to a guard before we actually use it for sensitive
+  traffic.
+
+4.4. The Primary guards [Section:PRIMARY]
+
+  We keep a run-time non-persistent ordered list of
+  {list:PRIMARY_GUARDS}.  It is a subset of {FILTERED_GUARDS}.  It
+  contains {N_PRIMARY_GUARDS} elements.
+
+  To compute primary guards, take the ordered intersection of
+  {CONFIRMED_GUARDS} and {FILTERED_GUARDS}, and take the first
+  {N_PRIMARY_GUARDS} elements.  If there are fewer than
+  {N_PRIMARY_GUARDS} elements, add additional elements to
+  PRIMARY_GUARDS chosen _uniformly_ at random from
+  ({FILTERED_GUARDS} - {CONFIRMED_GUARDS}).
+
+  Once an element has been added to {PRIMARY_GUARDS}, we do not remove it
+  until it is replaced by some element from {CONFIRMED_GUARDS}. Confirmed
+  elements always proceed unconfirmed ones in the {PRIMARY_GUARDS} list.
+
+  Note that {PRIMARY_GUARDS} do not have to be in
+  {USABLE_FILTERED_GUARDS}: they might be unreachable.
+
+    ** Rationale **
+
+  These guards are treated differently from other guards.  If one of
+  them is usable, then we use it right away. For other guards
+  {FILTERED_GUARDS}, if it's usable, then before using it we might
+  first double-check whether perhaps one of the primary guards is
+  usable after all.
+
+4.5. Retrying guards. [Section:RETRYING]
+
+  (We run this process as frequently as needed. It can be done once
+  a second, or just-in-time.)
+
+  If a primary sampled guard's {is_reachable} status is <no>, then
+  we decide whether to update its {is_reachable} status to <maybe>
+  based on its {last_tried_connect} time, its {failing_since} time,
+  and the {PRIMARY_GUARDS_RETRY_SCHED} schedule.
+
+  If a non-primary sampled guard's {is_reachable} status is <no>, then
+  we decide whether to update its {is_reachable} status to <maybe>
+  based on its {last_tried_connect} time, its {failing_since} time,
+  and the {GUARDS_RETRY_SCHED} schedule.
+
+    ** Rationale **
+
+  An observation that a guard has been 'unreachable' only lasts for
+  a given amount of time, since we can't infer that it's unreachable
+  now from the fact that it was unreachable a few minutes ago.
+
+4.6. Selecting guards for circuits. [Section:SELECTING]
+
+  Every origin circuit is now in one of these states:
+     <state:usable_on_completion>,
+     <state:usable_if_no_better_guard>,
+     <state:waiting_for_better_guard>, or
+     <state:complete>.
+
+  You may only attach streams to <complete> circuits.
+  (Additionally, you may only send RENDEZVOUS cells, ESTABLISH_INTRO
+  cells, and INTRODUCE cells on <complete> circuits.)
+
+  The per-circuit state machine is:
+
+      New circuits are <usable_on_completion> or
+      <usable_if_no_better_guard>.
+
+      A <usable_on_completion> circuit may become <complete>, or may
+      fail.
+
+      A <usable_if_no_better_guard> circuit may become
+      <usable_on_completion>; may become <waiting_for_better_guard>; or may
+      fail.
+
+      A <waiting_for_better_guard> circuit will become <complete>, or will
+      be closed, or will fail.
+
+      A <complete> circuit remains <complete> until it fails or is
+      closed.
+
+      Each of these transitions is described below.
+
+  We keep, as global transient state:
+
+    * {tvar:last_time_on_internet} -- the last time at which we
+      successfully used a circuit or connected to a guard.  At
+      startup we set this to "infinitely far in the past."
+
+  When we want to build a circuit, and we need to pick a guard:
+
+    * If any entry in PRIMARY_GUARDS has {is_reachable} status of
+      <maybe> or <yes>, return the first such guard. The circuit is
+      <usable_on_completion>.
+
+      [Note: We do not use {is_pending} on primary guards, since we
+      are willing to try to build multiple circuits through them
+      before we know for sure whether they work, and since we will
+      not use any non-primary guards until we are sure that the
+      primary guards are all down.  (XX is this good?)]
+
+    * Otherwise, if the ordered intersection of {CONFIRMED_GUARDS}
+      and {USABLE_FILTERED_GUARDS} is nonempty, return the first
+      entry in that intersection that has {is_pending} set to
+      false. Set its value of {is_pending} to true.  The circuit
+      is now <usable_if_no_better_guard>.  (If all entries have
+      {is_pending} true, pick the first one.)
+
+    * Otherwise, if there is no such entry, select a member at
+      random from {USABLE_FILTERED_GUARDS}. Set its {is_pending}
+      field to true.  The circuit is <usable_if_no_better_guard>.
+
+  We update the {last_tried_connect} time for the guard to 'now.'
+
+  In some cases (for example, when we need a certain directory feature,
+  or when we need to avoid using a certain exit as a guard), we need to
+  restrict the guards that we use for a single circuit. When this happens, we
+  remember the restrictions that applied when choosing the guard for
+  that circuit, since we will need them later (see [UPDATE_WAITING].).
+
+    ** Rationale **
+
+  We're getting to the core of the algorithm here.  Our main goals are to
+  make sure that
+    1. If it's possible to use a primary guard, we do.
+    2. We probably use the first primary guard.
+
+  So we only try non-primary guards if we're pretty sure that all
+  the primary guards are down, and we only try a given primary guard
+  if the earlier primary guards seem down.
+
+  When we _do_ try non-primary guards, however, we only build one
+  circuit through each, to give it a chance to succeed or fail.  If
+  ever such a circuit succeeds, we don't use it until we're pretty
+  sure that it's the best guard we're getting. (see below).
+
+         [XXX timeout.]
+
+4.7. When a circuit fails. [Section:ON_FAIL]
+
+   When a circuit fails in a way that makes us conclude that a guard
+   is not reachable, we take the following steps:
+
+      * We set the guard's {is_reachable} status to <no>.  If it had
+        {is_pending} set to true, we make it non-pending.
+
+      * We close the circuit, of course.  (This removes it from
+        consideration by the algorithm in [UPDATE_WAITING].)
+
+      * Update the list of waiting circuits.  (See [UPDATE_WAITING]
+        below.)
+
+   [Note: the existing Tor logic will cause us to create more
+   circuits in response to some of these steps; and also see
+   [ON_CONSENSUS].]
+
+    ** Rationale **
+
+   See [SELECTING] above for rationale.
+
+4.8. When a circuit succeeds [Section:ON_SUCCESS]
+
+   When a circuit succeeds in a way that makes us conclude that a
+   guard _was_ reachable, we take these steps:
+
+      * We set its {is_reachable} status to <yes>.
+      * We set its {failing_since} to "never".
+      * If the guard was {is_pending}, we clear the {is_pending} flag.
+      * If the guard was not a member of {CONFIRMED_GUARDS}, we add
+        it to the end of {CONFIRMED_GUARDS}.
+
+      * If this circuit was <usable_on_completion>, this circuit is
+        now <complete>. You may attach streams to this circuit,
+        and use it for hidden services.
+
+      * If this circuit was <usable_if_no_better_guard>, it is now
+        <waiting_for retry>.  You may not yet attach streams to it.
+        Then check whether the {last_time_on_internet} is more than
+        {INTERNET_LIKELY_DOWN_INTERVAL} seconds ago:
+
+           * If it is, then mark all {PRIMARY_GUARDS} as "maybe"
+             reachable.
+
+           * If it is not, update the list of waiting circuits. (See
+             [UPDATE_WAITING] below)
+
+   [Note: the existing Tor logic will cause us to create more
+   circuits in response to some of these steps; and see
+   [ON_CONSENSUS].]
+
+    ** Rationale **
+
+   See [SELECTING] above for rationale.
+
+4.9. Updating the list of waiting circuits [Section:UPDATE_WAITING]
+
+   We run this procedure whenever it's possible that a
+   <waiting_for_better_guard> circuit might be ready to be called
+   <complete>.
+
+   * If any circuit C1 is <waiting_for_better_guard>, AND:
+       * All primary guards have reachable status of <no>.
+       * There is no circuit C2 that "blocks" C1.
+     Then, upgrade C1 to <complete>.
+
+   Definition: In the algorithm above, C2 "blocks" C1 if:
+       * C2 obeys all the restrictions that C1 had to obey, AND
+       * C2 has higher priority than C1, AND
+       * Either C2 is <complete>, or C2 is <waiting_for_better_guard>,
+         or C2 has been <usable_if_no_better_guard> for no more than
+         {NONPRIMARY_GUARD_CONNECT_TIMEOUT} seconds.
+
+   We run this procedure periodically:
+
+   * If any circuit stays is <waiting_for_better_guard>
+     for more than {NONPRIMARY_GUARD_IDLE_TIMEOUT} seconds,
+     time it out.
+
+      **Rationale**
+
+   If we open a connection to a guard, we might want to use it
+   immediately (if we're sure that it's the best we can do), or we
+   might want to wait a little while to see if some other circuit
+   which we like better will finish.
+
+
+   When we mark a circuit <complete>, we don't close the
+   lower-priority circuits immediately: we might decide to use
+   them after all if the <complete> circuit goes down before
+   {NONPRIMARY_GUARD_IDLE_TIMEOUT} seconds.
+
+4.10.  Whenever we get a new consensus. [Section:ON_CONSENSUS]
+
+   We update {GUARDS}.
+
+   For every guard in {SAMPLED_GUARDS}, we update {IS_LISTED} and
+   {FIRST_UNLISTED_AT}.
+
+   [**] We remove entries from {SAMPLED_GUARDS} if appropriate,
+   according to the sampled-guards expiration rules.  If they were
+   in {CONFIRMED_GUARDS}, we also remove them from
+   {CONFIRMED_GUARDS}.
+
+   We recompute {FILTERED_GUARDS}, and everything that derives from
+   it, including {USABLE_FILTERED_GUARDS}, and {PRIMARY_GUARDS}.
+
+   (Whenever one of the configuration options that affects the
+   filter is updated, we repeat the process above, starting at the
+   [**] line.)
+
+4.11. Deciding whether to generate a new circuit.
+  [Section:NEW_CIRCUIT_NEEDED]
+
+   In current Tor, we generate a new circuit when we don't have
+   enough circuits either built or in-progress to handle a given
+   stream, or an expected stream.
+
+   For the purpose of this rule, we say that <waiting_for_better_guard>
+   circuits are neither built nor in-progress; that <complete>
+   circuits are built; and that the other states are in-progress.
+
+A. Appendices
+
+A.1.  Parameters with suggested values. [Section:PARAM_VALS]
+
+   (All suggested values chosen arbitrarily)
+
+   {param:MAX_SAMPLE_THRESHOLD} -- 20%
+
+   {param:MAX_SAMPLE_SIZE} -- 60
+
+   {param:GUARD_LIFETIME} -- 120 days
+
+   {param:REMOVE_UNLISTED_GUARDS_AFTER} -- 20 days
+     [previously ENTRY_GUARD_REMOVE_AFTER]
+
+   {param:MIN_FILTERED_SAMPLE} -- 20
+
+   {param:N_PRIMARY_GUARDS} -- 3
+
+   {param:PRIMARY_GUARDS_RETRY_SCHED}
+      -- every 30 minutes for the first 6 hours.
+      -- every 2 hours for the next 3.75 days.
+      -- every 4 hours for the next 3 days.
+      -- every 9 hours thereafter.
+
+   {param:GUARDS_RETRY_SCHED} -- 1 hour
+      -- every hour for the first 6 hours.
+      -- every 4 hours for the next 3.75 days.
+      -- every 18 hours for the next 3 days.
+      -- every 36 hours thereafter.
+
+   {param:INTERNET_LIKELY_DOWN_INTERVAL} -- 10 minutes
+
+   {param:NONPRIMARY_GUARD_CONNECT_TIMEOUT} -- 15 seconds
+
+   {param:NONPRIMARY_GUARD_IDLE_TIMEOUT} -- 10 minutes
+
+   {param:MEANINGFUL_RESTRICTION_FRAC} -- .2
+
+   {param:EXTREME_RESTRICTION_FRAC} -- .01
+
+   {param:GUARD_CONFIRMED_MIN_LIFETIME} -- 60 days
+
+A.2. Random values [Section:RANDOM]
+
+   Frequently, we want to randomize the expiration time of something
+   so that it's not easy for an observer to match it to its start
+   time. We do this by randomizing its start date a little, so that
+   we only need to remember a fixed expiration interval.
+
+   By RAND(now, INTERVAL) we mean a time between now and INTERVAL in
+   the past, chosen uniformly at random.
+
+
+A.3. Why not a sliding scale of primaryness? [Section:CVP]
+
+   At one meeting, I floated the idea of having "primaryness" be a
+   continuous variable rather than a boolean.
+
+   I'm no longer sure this is a great idea, but I'll try to outline
+   how it might work.
+
+   To begin with: being "primary" gives it a few different traits:
+
+      1) We retry primary guards more frequently. [Section:RETRYING]
+
+      2) We don't even _try_ building circuits through
+         lower-priority guards until we're pretty sure that the
+         higher-priority primary guards are down. (With non-primary
+         guards, on the other hand, we launch exploratory circuits
+         which we plan not to use if higher-priority guards
+         succeed.) [Section:SELECTING]
+
+      3) We retry them all one more time if a circuit succeeds after
+         the net has been down for a while. [Section:ON_SUCCESS]
+
+   We could make each of the above traits continuous:
+
+      1) We could make the interval at which a guard is retried
+         depend continuously on its position in CONFIRMED_GUARDS.
+
+      2) We could change the number of guards we test in parallel
+         based on their position in CONFIRMED_GUARDS.
+
+      3) We could change the rule for how long the higher-priority
+         guards need to have been down before we call a
+         <usable_if_no_better_guard> circuit <complete> based on a
+         possible network-down condition.  For example, we could
+         retry the first guard if we tried it more than 10 seconds
+         ago, the second if we tried it more than 20 seconds ago,
+         etc.
+
+   I am pretty sure, however, that if these are worth doing, they
+   need more analysis!  Here's why:
+
+      * They all have the potential to leak more information about a
+        guard's exact position on the list.  Is that safe? Is there
+        any way to exploit that?  I don't think we know.
+
+      * They all seem like changes which it would be relatively
+        simple to make to the code after we implement the simpler
+        version of the algorithm described above.
+
+A.3. Controller changes
+
+   We will add to control-spec.txt a new possible circuit state, GUARD_WAIT,
+   that can be given as part of circuit events and GETINFO responses about
+   circuits.  A circuit is in the GUARD_WAIT state when it is fully built,
+   but we will not use it because a circuit with a better guard might
+   become built too.
+
+A.4. Persistent state format
+
+   The persistent state format doesn't need to be part of this
+   proposal, since different implementations can do it
+   differently. Nonetheless, here's the one Tor uses:
+
+   The "state" file contains one Guard entry for each sampled guard
+   in each instance of the guard state (see section 2).  The value
+   of this Guard entry is a set of space-separated K=V entries,
+   where K contains any nonspace character except =, and V contains
+   any nonspace characters.
+
+   Implementations must retain any unrecognized K=V entries for a
+   sampled guard when the regenerate the state file.
+
+   The order of K=V entries is not allowed to matter.
+
+   Recognized fields (values of K) are:
+
+        "in" -- the name of the guard state instance that this
+        sampled guard is in.  If a sampled guard is in two guard
+        states instances, it appears twice, with a different "in"
+        field each time. Required.
+
+        "rsa_id" -- the RSA id digest for this guard, encoded in
+        hex. Required.
+
+        "bridge_addr" -- If the guard is a bridge, its configured
+        address and OR port. Optional.
+
+        "nickname" -- the guard's nickname, if any. Optional.
+
+        "sampled_on" -- the date when the guard was sampled. Required.
+
+        "sampled_by" -- the Tor version that sampled this guard.
+        Optional.
+
+        "unlisted_since" -- the date since which the guard has been
+        unlisted. Optional.
+
+        "listed" -- 0 if the guard is not listed ; 1 if it is. Required.
+
+        "confirmed_on" -- date when the guard was
+        confirmed. Optional.
+
+        "confirmed_idx" -- position of the guard in the confirmed
+        list. Optional.
+
+        "pb_use_attempts", "pb_use_successes", "pb_circ_attempts",
+        "pb_circ_successes", "pb_successful_circuits_closed",
+        "pb_collapsed_circuits", "pb_unusable_circuits",
+        "pb_timeouts" -- state for the circuit path bias algorithm,
+        given in decimal fractions. Optional.
+
+   All dates here are given as a (spaceless) ISO8601 combined date
+   and time in UTC (e.g., 2016-11-29T19:39:31).
+
+
+TODO. Still non-addressed issues [Section:TODO]
+
+   Simulate to answer:  Will this work in a dystopic world?
+
+   Simulate actual behavior.
+
+   For all lifetimes: instead of storing the "this began at" time,
+   store the "remove this at" time, slightly randomized.
+
+   Clarify that when you get a <complete> circuit, you might need to
+   relaunch circuits through that same guard immediately, if they
+   are circuits that have to be independent.
+
+
+   Fix all items marked XX or TODO.
+
+   "Directory guards" -- do they matter?
+
+       Suggestion: require that all guards support downloads via BEGINDIR.
+       We don't need to worry about directory guards for relays, since we
+       aren't trying to prevent relay enumeration.
+
+   IP version preferenes via ClientPreferIPv6ORPort
+
+       Suggestion: Treat it as a preference when adding to
+       {CONFIRMED_GUARDS}, but not otherwise.
+

+ 6 - 117
path-spec.txt

@@ -554,123 +554,12 @@ of their choices.
 
 5. Guard nodes
 
-  We use Guard nodes (also called "helper nodes" in the literature) to
-  prevent certain profiling attacks.  Here's the risk: if we choose entry and
-  exit nodes at random, and an attacker controls C out of N relays
-  (ignoring bandwidth), then the
-  attacker will control the entry and exit node of any given circuit with
-  probability (C/N)^2.  But as we make many different circuits over time,
-  then the probability that the attacker will see a sample of about (C/N)^2
-  of our traffic goes to 1.  Since statistical sampling works, the attacker
-  can be sure of learning a profile of our behavior.
-
-  If, on the other hand, we picked an entry node and held it fixed, we would
-  have probability C/N of choosing a bad entry and being profiled, and
-  probability (N-C)/N of choosing a good entry and not being profiled.
-
-  When guard nodes are enabled, Tor maintains an ordered list of entry nodes
-  as our chosen guards, and stores this list persistently to disk.  If a Guard
-  node becomes unusable, rather than replacing it, Tor adds new guards to the
-  end of the list.  When choosing the first hop of a circuit, Tor
-  chooses at
-  random from among the first NumEntryGuards (default 3) usable guards on the
-  list.  If there are not at least 2 usable guards on the list, Tor adds
-  routers until there are, or until there are no more usable routers to add.
-
-  A guard is unusable if any of the following hold:
-    - it is not marked as a Guard by the networkstatuses,
-    - it is not marked Valid (and the user hasn't set AllowInvalid entry)
-    - it is not marked Running
-    - Tor couldn't reach it the last time it tried to connect
-
-  A guard is unusable for a particular circuit if any of the rules for path
-  selection in 2.2 are not met.  In particular, if the circuit is "fast"
-  and the guard is not Fast, or if the circuit is "stable" and the guard is
-  not Stable, or if the guard has already been chosen as the exit node in
-  that circuit, Tor can't use it as a guard node for that circuit.
-
-  If the guard is excluded because of its status in the networkstatuses for
-  over 30 days, Tor removes it from the list entirely, preserving order.
-
-  If Tor fails to connect to an otherwise usable guard, it retries
-  periodically: every hour for six hours, every 4 hours for 3 days, every
-  18 hours for a week, and every 36 hours thereafter.  Additionally, Tor
-  retries unreachable guards the first time it adds a new guard to the list,
-  since it is possible that the old guards were only marked as unreachable
-  because the network was unreachable or down.
-
-  Tor does not add a guard persistently to the list until the first time we
-  have connected to it successfully.
-
-5.1. Guard selection algorithm
-
-  If configured to use entry guards, and the circuit's purpose is not marked
-  for testing, then a random entry guard from the persisted state (as
-  mentioned earlier in §5) will be chosen (provided there is already some
-  persisted state storing previously chosen guard nodes).
-
-  Otherwise, if any the above conditions are not satisfied, then a new entry
-  guard node will be chosen for that circuit.  The algorithm is as follows:
-
-    - EXCLUDED_NODES is a list of nodes which, for some reason, are not
-      acceptable for use as an entry guard.
-
-    1. If an exit node has been chosen for the circuit:
-
-       1.a. Then that exit is added to EXCLUDED_NODES (and thus will not be
-            used as the entry guard).
-
-    2. If running behind a fascist firewall (e.g. outgoing connections are
-       only permitted to ports 80 and/or 443):
-
-       2.a. For all known routers in the network (as given in the
-            networkstatus document), a router is added to the list of
-            EXCLUDED_NODES iff it does not advertise the ability to be reached
-            via the ports allowed through the fascist firewall.
-
-    3. Add any entry guards currently in volatile storage, as well as all
-       nodes within their families, to EXCLUDED_NODES.
-
-    4. Determine which of the following flags should apply to the selection of
-       an entry guard:
-
-         * CRN_NEED_UPTIME: the router can only be chosen as an entry guard
-           iff has been available for at least some minimum uptime.
-         * CRN_NEED_CAPACITY: potentially suitable routers are weighted by
-           their advertised bandwidth capacity.
-         * CRN_ALLOW_INVALID: also consider using routers which have been
-           marked as invalid.
-         * CRN_NEED_GUARD: only consider routers which have the Guard flag.
-         * CRN_NEED_DESC: only consider routers for which we have enough
-           information to be used to build a circuit.
-
-       Additionally, if configured to allow nodes marked as invalid AND to
-       specifically allow entry guards which have been marked as invalid, then
-       the CRN_ALLOW_INVALID flag will be set.  Lastly, the CRN_NEED_GUARD and
-       CRN_NEED_DESC flags are always applied, regardless of configuration.
-
-    5. If configured to exclude routers which allow single-hop circuits, then
-       the list of known routers is traversed, and all routers which permit
-       single-hop circuits are added to EXCLUDED_NODES.
-
-    6. If we are an OR, add ourselves (and our family) to EXCLUDED_NODES.
-
-    7. The list of potential routers is weighted according to the bandwidth
-       weights from the consensus (cf. §5.1.1), and then a random selection is
-       chosen with respect to those weights.
-
-       7.a. If we've made a choice now, the algorithm finishes.
-       7.b. Otherwise, continue to step #8.
-
-    8. We couldn't find a suitable guard, so now we try much harder by
-       discarding the CRN_NEED_UPTIME, CRN_NEED_CAPACITY, and CRN_NEED_GUARD
-       selection flags.  This effectively means we'll use nearly any router,
-       except for ones already in EXCLUDED_LIST.
-
-       [XXX Does this mean we even include BadExits and other misbehaving
-       nodes?  This sounds bad.  —isis]
-
-5.1.1. How consensus bandwidth weights factor into entry guard selection
+  We use Guard nodes (also called "helper nodes" in the research
+  literature) to prevent certain profiling attacks. For an overview of
+  our Guard selection algorithm -- which has grown rather complex -- see
+  guard-spec.txt.
+
+5.1. How consensus bandwidth weights factor into entry guard selection
 
   When weighting a list of routers for choosing an entry guard, the following
   consensus parameters (from the "bandwidth-weights" line) apply: