Zachary Clement

A place to host my thoughts and side projects

julia_gift_exchange_simulation

Posted at — Oct 11, 2025

Simulation study

Simulation studies are used in statistics to estimate the probability of a certain event happening using pseudorandom numbers generated by a computer. Because it can be expensive and time-consuming to collect lots of real-world data on a phenomenon, statisticians sometimes use computers to create many simulated worlds and determine the frequency of an event happening in those simulated worlds. Simulation studies are used when it is difficult or impossible to analytically compute the probability of an event happening. For example, simulation is generally used to model processes happening on complex networks.

Current problem: christmas gift exchange

One of my favorite family traditions is our sibling gift draw: we each draw from a hat to choose a sibling to give a gift to. Then, on christmas, we all open gifts from each other one at a time.

My youngest sibling will usually open the first gift. Then, whoever gave the youngest sibling the gift opens their gift, and so on. This cycle of opening gifts continues until either we have all opened our gifts, or until someone opens a gift which was given to them by the youngest sibling. If this happens, we have to choose another person to open a gift, and start another cycle of gift-opening.

This year, something unusual happened: there were 4 “cycles” of gift-giving—the maximum number possible. Six of us had reciprocal gift assignments (they received a gift from the same person that they gave a gift to). This seemed unusual to me, but I wanted to know how unusual this event was. So, I set out to do a simulation study to assess how likely this was.

Step 1: simulate a data-generating process

In my family, gift assignments are given essentially at random. There are only two constraints: each person must give a gift to exactly one other person, and nobody can be assigned to give a gift to themselves. So, here, I’ll define a function which generates a possible arrangement of gift-giving.

using Random
function get_valid_selection(num_participants)
    selection_found = false
    selection = randperm(num_participants)
    while !selection_found
        no_self_assignments = true
        for i in 1:num_participants
            if selection[i] == i ## someone got themselves
                no_self_assignments = false
            end
        end
        if no_self_assignments
            selection_found = true
        else
            selection = randperm(num_participants)
        end
    end
    return selection
end
get_valid_selection (generic function with 1 method)

This function takes the number of family members and returns a vector saying who is assigned to give to whom. In this simulation, person 1 receives a gift from person 7, person 2 receives a gift to person 8, and so on.

get_valid_selection(9)
9-element Vector{Int64}:
 8
 7
 4
 3
 6
 1
 9
 5
 2

Step 2: Generate a metric from simulated data

Now that I can simulate a family gift drawing, I need to figure out how rare my event of having 4 cycles is. To count the number of cycles, I’ll define a function

function count_cycles(selection)
    num_participants = length(selection) # number of people in the drawing
    given_gifts = repeat([-1], num_participants) #a vector of placeholders to record who has given a gift already
    num_cycles = 1 ## we will always have at least one cycle
    current_opener = 1 ## the youngest person will start with opening their gift
    final_sum = sum(1:num_participants) ## our given_gifts vector will sum to this once everyone has been found
    while sum(given_gifts) != final_sum
        if !(current_opener in given_gifts) ## they haven't opened a gift yet, so we'll stay in the current cycle
            # Add them to the opened_gifts list to mark that we have to start a new cycle if they're
            # assigned to open a gift
            given_gifts[current_opener] = current_opener
            # Now, they open their gift, and the person who gave them the gift
            # opens the next gift 
            current_opener = selection[current_opener] 
        else # They've already given a gift, so we need to start a new cycle
            num_cycles = num_cycles + 1
            # choose someone who hasn't opened a gift yet to open the next gift.
            current_opener = findfirst(given_gifts .== -1) 
            # the .== means that we're comparing each element of given_gifts to -1

        end
    end
    return num_cycles
end
count_cycles (generic function with 1 method)

Step 3: Estimate the probability of an event happening in the real world.

Now, we run our simulation lots of times, and estimate the probability that we will see a given number of cycles. Here, I’m just using a for loop to run both get_valid_selection and count_cycles many times. Then, I create a frequency table to estimate the probability of getting 4 cycles. In this function, I will use the normal approximation of the binomial distribution to create a 95% confidence interval around the probabilities I estimate.

using FreqTables, Distributions
function estimate_probabilities(num_participants, num_simulations)
    num_cycles = repeat([-1], num_simulations) #create a placeholder vector for each simulation we do

    for i in 1:num_simulations
        selection = get_valid_selection(num_participants)
        num_cycles[i] = count_cycles(selection)
    end
    prop_table = prop(freqtable(num_cycles)) #create a table showing the frequency of getting given number of cycles
    println("Results for ", num_participants, " participants, using ", num_simulations, " simulations")
    println(prop_table)
    max_cycles = maximum(names(prop_table)[1])
    z_975 = quantile(Normal(0.0, 1.0), .975)
    p_hat = prop_table[max_cycles]
    lower_bound = p_hat - z_975 * sqrt(p_hat  * (1 - p_hat) / num_simulations)
    upper_bound = p_hat + z_975 * sqrt(p_hat  * (1 - p_hat) / num_simulations)
    println("95% confidence interval for probability of getting ", max_cycles, " cycles: [", 
        round(lower_bound, digits = 4), ", ", round(upper_bound, digits = 4), "]")
    
end

function estimate_probabilities_for_printing(num_participants, num_simulations)
    num_cycles = repeat([-1], num_simulations) #create a placeholder vector for each simulation we do

    for i in 1:num_simulations
        selection = get_valid_selection(num_participants)
        num_cycles[i] = count_cycles(selection)
    end
    prop_table = prop(freqtable(num_cycles)) #create a table showing the frequency of getting given number of cycles
    #println("Results for ", num_participants, " participants, using ", num_simulations, " simulations")
    #println(prop_table)
    max_cycles = maximum(names(prop_table)[1])
    z_975 = quantile(Normal(0.0, 1.0), .975)
    p_hat = prop_table[max_cycles]
    lower_bound = p_hat - z_975 * sqrt(p_hat  * (1 - p_hat) / num_simulations)
    upper_bound = p_hat + z_975 * sqrt(p_hat  * (1 - p_hat) / num_simulations)
    #println("95% confidence interval for probability of getting ", max_cycles, " cycles: [", 
    #    round(lower_bound, digits = 4), ", ", round(upper_bound, digits = 4), "]")
    print("\t",round(lower_bound, digits = 4),"\t",round(upper_bound, digits = 4), "\n")
    
end
estimate_probabilities_for_printing (generic function with 1 method)
estimate_probabilities(9, 1000000)
Results for 9 participants, using 1000000 simulations
4-element Named Vector{Float64}
Dim1  │ 
──────┼─────────
1     │ 0.303011
2     │ 0.480495
3     │ 0.197632
4     │ 0.018862
95% confidence interval for probability of getting 4 cycles: [0.0186, 0.0191]

It’s really quite remarkable how quickly Julia runs this simulation. I’m used to running simulations in R or Python which are a lot slower. It almost feels like Julia isn’t even running the simulations.

Analytical solution

Using simulation, I have a pretty good estimate for the probability of this event happening. However, let’s double-check our work using analytic techniques.

To do this, we need to compute the following fraction:

$\frac{\text{possible ways to get 4 cycles}}{\text{possible ways gifts could be assigned}}$

Let’s start with the denominator. To find the number of different ways a gift exchange could happen, we can use derangements, which estimate the number of permutations of a set such that no element of the set remains in its original position. We can estimate this according to the following formula:

$!n = n! \sum_{i = 0}^{n}\frac{(-1)^i}{i!}$

And, I’ll write this function to easily calculate this quantity:

function get_derangement(num_participants)

    sum = 0
    for i in 0:num_participants
        sum += (-1)^i / factorial(i)
    end
    return round(sum * factorial(num_participants))
end
get_derangement (generic function with 1 method)

Next, let’s figure out the numerator of that fraction.

We can think about this as choosing 2 people at random until that is no longer possible, because the maximum number of cycles for a given group of people will always involve mostly groups of 2.

If n is even, we can think about just choosing pairs of 2 people until people run out. However, when we do this, we can end up double-counting ways to assign gift pairs because we haven’t considered order. To fix this double-counting, we divide by the factorial of the number of pairs we selected. So, we can use the following formula if n is even:

$\frac{\prod_{i = 1}^{n / 2}{2i \choose 2}}{(n / 2)!}$

We can do a similar thing if n is odd. However, in this case, the last set chosen is a set of 3, so we don’t have to worry about double-counting for that set. But, we do have to consider that there are multiple ways to choose how people give others gifts in that last set of 3, so we will multiply by the derangement of 3 to account for that. So, we can use the following equation if n is odd:

$\frac{!3 \prod_{i = 1}^{(n- 3) / 2}{2i \choose 2}}{((n-3) / 2)!}$

function comb(n, k) #number of ways to choose k people from a group of n people
    factorial(n) / factorial(k) / factorial(n - k)
end

function get_numerator(num_participants)
    num_participants_left = num_participants
    numerator = 1
    num_2_pairs = 0 # what we have to divide by in the denominator
    while num_participants_left > 3
        numerator  = numerator * comb(num_participants_left, 2)
        num_participants_left = num_participants_left - 2
        num_2_pairs = num_2_pairs + 1
    end

    if num_participants_left == 3
        numerator = numerator * get_derangement(3) / factorial(num_2_pairs)
    else # in this case, we have just 2 participants left, so we have to include them as a choice of 2
        num_2_pairs = num_2_pairs + 1
        numerator = numerator * comb(2, 2) / factorial(num_2_pairs)
    end
end
get_numerator (generic function with 1 method)

Finally, let’s put these together into a function that calculates the proportion of individuals with the maximum number of cycles for a given group size.

function analytical_probability(num_participants)
    prob = get_numerator(num_participants) / get_derangement(num_participants)
    println("Actual probability: ", prob)
end

function analytical_probability_printing(num_participants)
    prob = get_numerator(num_participants) / get_derangement(num_participants)
    print(round(prob, digits = 4))
end
analytical_probability_printing (generic function with 1 method)
for i in 4:12
    print(i, "\t")
    analytical_probability_printing(i)
    estimate_probabilities_for_printing(i, 1000000)
    
end
4	0.3333	0.3329	0.3348
5	0.4545	0.4535	0.4555
6	0.0566	0.0565	0.0574
7	0.1133	0.1123	0.1136
8	0.0071	0.0069	0.0072
9	0.0189	0.0185	0.019
10	0.0007	0.0007	0.0008
11	0.0024	0.0022	0.0024
12	0.0001	0.0	0.0001
num = comb(4, 2) * comb(2, 2) / factorial(2)
den = get_derangement(4)
println("Numerator: ", num)
println("Denominator: ", den)
println(num / den)
Numerator: 3.0
Denominator: 9.0
0.3333333333333333

num = comb(5, 2) * get_derangement(3) / factorial(1)
den = get_derangement(5)
println("Numerator: ", num)
println("Denominator: ", den)
println(num / den)
Numerator: 20.0
Denominator: 44.0
0.45454545454545453
num = comb(6, 2) * comb(4, 2) * comb(2, 2) / factorial(3)
den = get_derangement(6)
println("Numerator: ", num)
println("Denominator: ", den)
println(num / den)
Numerator: 15.0
Denominator: 265.0
0.05660377358490566

num = comb(7, 2) * comb(5, 2) *  get_derangement(3) / factorial(2)
den = get_derangement(7)
println("Numerator: ", num)
println("Denominator: ", den)
println(num / den)
Numerator: 210.0
Denominator: 1854.0
0.11326860841423948

num = comb(8, 2) * comb(6, 2) * comb(4, 2) * comb(2, 2) / factorial(4)
den = get_derangement(8)

println("Numerator: ", num)
println("Denominator: ", den)
println(num / den)
Numerator: 105.0
Denominator: 14833.0
0.0070788107597923545
num = comb(9, 2) * comb(7, 2) * comb(5, 2) * get_derangement(3) / factorial(3)
den = get_derangement(9)
println("Numerator: ", num)
println("Denominator: ", den)
println(num / den)
Numerator: 2520.0
Denominator: 133496.0
0.01887697009648229
using Random
function get_valid_selection(num_participants)
    selection_found = false
    selection = randperm(num_participants)
    while !selection_found
        no_self_assignments = true
        for i in 1:num_participants
            if selection[i] == i ## someone got themselves
                no_self_assignments = false
            end
        end
        if no_self_assignments
            selection_found = true
        else
            selection = randperm(num_participants)
        end
    end
    return selection
end

function count_cycles(selection)
    num_participants = length(selection)
    found_members = repeat([-1], num_participants)
    num_cycles = 1 ## we have to have at least one cycle
    current_giver = 1 ## the first person to get a gift
    final_sum = sum(1:num_participants)
    while sum(found_members) != final_sum

        if !(current_giver in found_members) ## they haven't given a gift yet
            found_members[current_giver] = current_giver ## Add them to the current_giver list
            current_giver = selection[current_giver] ## The next person opens the gift
        else # They've already given a gift, so we need to start a new cycle
            num_cycles = num_cycles + 1
            current_giver = findfirst(found_members .== -1)

        end
    end
    return num_cycles
end
count_cycles (generic function with 2 methods)
using FreqTables
function estimate_probabilities(num_participants, num_simulations)
    num_cycles = repeat([-1], num_simulations)

    for i in 1:num_simulations
        selection = get_valid_selection(num_participants)
        num_cycles[i] = count_cycles(selection)
    end
    table = freqtable(num_cycles)
    println("Table for ", num_participants, " participants")
    println(prop(table))
end
estimate_probabilities (generic function with 1 method)
estimate_probabilities(9, 100000)
Table for 9 participants
4-element Named Vector{Float64}
Dim1  │ 
──────┼────────
1     │ 0.30412
2     │ 0.47919
3     │ 0.19751
4     │ 0.01918
using DataFrames
DataFrame(name = names(table), probability = table)
DimensionMismatch: column :name has length 1 and column :probability has length 2



Stacktrace:

 [1] DataFrame(columns::Vector{Any}, colindex::DataFrames.Index; copycols::Bool)

   @ DataFrames ~/.julia/packages/DataFrames/dgZn3/src/dataframe/dataframe.jl:206

 [2] DataFrame(; kwargs::Base.Pairs{Symbol, AbstractVector, Tuple{Symbol, Symbol}, NamedTuple{(:name, :probability), Tuple{Vector{Vector{Int64}}, NamedArrays.NamedVector{Float64, Vector{Float64}, Tuple{OrderedCollections.OrderedDict{Int64, Int64}}}}}})

   @ DataFrames ~/.julia/packages/DataFrames/dgZn3/src/dataframe/dataframe.jl:326

 [3] top-level scope

   @ In[71]:2

 [4] eval

   @ ./boot.jl:368 [inlined]

 [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)

   @ Base ./loading.jl:1428
println(table)
2-element Named Vector{Float64}
Dim1  │ 
──────┼─────
1     │ 0.75
2     │ 0.25
x x 3 4 5
x 2 x 4 5
x 2 3 x 5
x 2 3 4 x
1 x x 4 5
1 x 3 x 5
1 x 3 4 x
1 2 x x 5
1 2 x 4 x
1 2 3 x x