The game is a broadcasting coordination game. There are N agents and C channels (both sliders), where N >= C. At each time step all agents choose a channel for broadcasting but, due to physical limitations, only one agent can actually transmit on each channel at a time. Thus, if an agent uniquely chooses a channel then he gets a utility of 1, if 2 or more agents choose the same channel they all receive a utility of 0.
You will first solve this problem by having all the agents implement the Q-learning algorithm. Note that there is only 1 state in this game, so the Q-learning will only happen over the actions of the players. Run it for a while and plot the sum of the utilities to see how well they do as a group.
Then, you will implement the algorithm from the paper (use the same NetLogo model, just add a toggle switch so the user can choose between the two). The algorithm is described in Section 2 and works roughly as follows (but, see the paper):
- At each tick, there is a randomly generated integer (signal), from 1 to K, that all agents can observe. They use this signal in their learning.
- Each agent keeps a table f(k), for k = 1..K and where f(k) is either the channel the agent will transmit if it sees signal k, or 0 if will not transmit at all when it sees that signal. Table f(k) is initialized to random channels.
- At each tick, with signal k, the agent transmits on channel f(k). If f(k) = 0 then the agent chooses a random channel to monitor.
- The agents get their utility based on the collisions. If the agent gets 0 utility then it sets f(k) = 0 with probability p (slider).
- If the agent was monitoring channel c then if no one transmitted on it the agent will set f(k) = c.
The .nlogo model is due on Monday, November 21 @9am.
No comments:
Post a Comment