🟦MDE#

alt text

ΠœΠΎΡ‰Π½ΠΎΡΡ‚ΡŒΡŽ статистичСского критСрия Π½Π°Π·Ρ‹Π²Π°ΡŽΡ‚ Π²Π΅Ρ€ΠΎΡΡ‚Π½ΠΎΡΡ‚ΡŒ ΠΏΡ€ΠΈΠ½ΡΡ‚ΡŒ Π°Π»ΡŒΡ‚Π΅Ρ€Π½Π°Ρ‚ΠΈΠ²Π½ΡƒΡŽ Π³ΠΈΠΏΠΎΡ‚Π΅Π·Ρƒ Π² случаС, Ссли ΠΎΠ½Π° Π²Π΅Ρ€Π½Π°.

MDE - это ΠΌΠΈΠ½ΠΈΠΌΠ°Π»ΡŒΠ½Ρ‹ΠΉ эффСкт, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ ΠΌΠΎΠΆΠ½ΠΎ ΠΎΠ±Π½Π°Ρ€ΡƒΠΆΠΈΡ‚ΡŒ ΠΏΡ€ΠΈ Π²Ρ‹Π±Ρ€Π°Π½Π½ΠΎΠΌ ΡƒΡ€ΠΎΠ²Π½Π΅ значимости ΠΈ мощности.

ΠœΠ°Ρ‚Π΅ΠΌΠ°Ρ‚ΠΈΡ‡Π΅ΡΠΊΠΎΠ΅ обоснованиС#

ΠŸΡƒΡΡ‚ΡŒ \(X_1, X_2, ..., X_n \sim N(\mu, \sigma^2_0)\)

Π“ΠΈΠΏΠΎΡ‚Π΅Π·Ρ‹:

  • \(H0: \mu = \mu_0\)

  • \(H1: \mu = \mu_1\), (\(\mu_0 < \mu_1\))

Π’ΠΎΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅ΠΌΡΡ ΠΊΡ€ΠΈΡ‚Π΅Ρ€ΠΈΠ΅ΠΌ ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΡ правдоподобия:

\[ T(X) = \frac{L_1}{L_0} = \frac { \prod_{i=1}^{n} \frac{1}{\sigma_0\sqrt{2\pi}}\, exp\{\frac{-(X_i - \mu_1)^2}{2\sigma^2_0}\} } { \prod_{i=1}^{n} \frac{1}{\sigma_0\sqrt{2\pi}}\, exp\{\frac{-(X_i - \mu_0)^2}{2\sigma^2_0}\} }= \]
\[ =exp\left\{\frac{1}{2\sigma^2_0}\left(\sum_{i=1}^n 2X_i(\mu_1 - \mu_0) + \sum_{i=1}^n \left(\mu_1 - \mu_0\right)^2\right) \right\} \]

Π­Ρ‚Π° Π²Π΅Π»ΠΈΡ‡ΠΈΠ½Π° зависит ΠΎΡ‚ Π²Ρ‹Π±ΠΎΡ€ΠΊΠΈ Π² ΠΎΠ΄Π½ΠΎΠΌ мСстС. ΠšΡ€ΠΈΡ‚Π΅Ρ€ΠΈΠΉ ΠΈΠΌΠ΅Π΅Ρ‚ Π²ΠΈΠ΄ \(T(X) \geq c^*(\alpha)\) ΠΈΠ»ΠΈ \(\sum_{i=1}^n X_i \geq c(\alpha)\), Π³Π΄Π΅ \(\alpha\) - ΡƒΡ€ΠΎΠ²Π΅Π½ΡŒ статистичСской значимости.

ΠœΡ‹ Ρ…ΠΎΡ‚ΠΈΠΌ Π·Π°Ρ„ΠΈΠΊΡΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ ΠΎΡˆΠΈΠ±ΠΊΡƒ 1 Ρ€ΠΎΠ΄Π°: \(P(\sum X_i \geq q| H_0) \leq \alpha\). Π’.Π΅. Ρ…ΠΎΡ‚ΠΈΠΌ ΠΏΠΎΠ΄ΠΎΠ±Ρ€Π°Ρ‚ΡŒ Ρ‚Π°ΠΊΡƒΡŽ \(c\), Ρ‡Ρ‚ΠΎΠ±Ρ‹ наша статистика Π²Ρ‹Ρ…ΠΎΠ΄ΠΈΠ»Π° Π·Π° Π΅Π΅ ΠΏΡ€Π΅Π΄Π΅Π»Ρ‹ с Π²Π΅Ρ€ΠΎΡΡ‚Π½ΠΎΡΡ‚ΡŒΡŽ \(\leq \alpha\).

По ЦПВ \(\sum X_i \sim N(n\mu, nVar(X_i))\). Π’ нашСм случаС \(\sum X_i \sim N(n\mu, n\sigma^2_0)\)

ΠžΡ‚Π½ΠΎΡ€ΠΌΠΈΡ€ΡƒΠ΅ΠΌ ΠΈ рассмотрим ΠΊΡ€Π°ΠΉΠ½ΠΈΠΉ случай, ΠΊΠΎΠ³Π΄Π° \(P(\cdot) = \alpha\):

\[ P\left(\frac{\sum X_i - n\mu_0}{\sqrt n \sigma_0} \geq \frac{c-n\mu_0}{\sqrt n \sigma_0}\right) = \alpha \]

ΠŸΠ΅Ρ€Π΅ΠΏΠΈΡˆΠ΅ΠΌ Π² ΡΠ»Π΅Π΄ΡƒΡŽΡ‰Π΅ΠΌ Π²ΠΈΠ΄Π΅:

\[ 1-F_{N(0, 1)}\left(\frac{c - n\mu_0}{\sqrt{n}\sigma_0}\right) = \alpha \]

Π’Ρ‹Ρ€Π°Π·ΠΈΠΌ \(c\):

\[ c= F^{-1}_{N(0, 1)}(1-\alpha)\sqrt n \sigma_0 + n\mu_0 \]

Π—Π°ΠΌΠ΅Ρ‚ΠΈΠΌ, Ρ‡Ρ‚ΠΎ \(c\) Π½Π΅ зависит ΠΎΡ‚ \(\mu_1\) ΠΈ \(\forall \mu_1: \mu_1 > \mu_0\).

РазбСрСмся Ρ‚Π΅ΠΏΠ΅Ρ€ΡŒ с ошибкой 2 Ρ€ΠΎΠ΄Π°, Ρ‚.Π΅. \(P(\sum X_i \geq c | H_1) \geq 1-\beta\) РаспишСм:

\[ P\left(\frac{\sum X_i - n\mu_1}{\sqrt n \sigma_0} \geq \frac{c-n\mu_1}{\sqrt n \sigma_0}\right) \geq 1-\beta \]

ΠŸΠΎΠ΄ΡΡ‚Π°Π²ΠΈΠΌ сюда \(c\), ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Π½ΠΎΠ΅ Ρ€Π°Π½Π΅Π΅:

\[ P\left(\frac{\sum X_i - n\mu_1}{\sqrt n \sigma_0} \geq\frac{F^{-1}_{N(0, 1)}(1-\alpha)\sqrt n \sigma_0 + n\mu_0 - n\mu_1}{\sqrt n \sigma_0}\right) \geq 1-\beta \]
\[ 1 - F_{N(0, 1)}\left[F^{-1}_{N(0, 1)}(1-\alpha) + \frac{\sqrt n(\mu_0 - \mu_1)}{\sigma_0}\right] \geq 1-\beta \]
\[ \beta \geq F_{N(0, 1)}\left[F^{-1}_{N(0, 1)}(1-\alpha) + \frac{\sqrt n(\mu_0 - \mu_1)}{\sigma_0}\right] \]
\[ F^{-1}_{N(0, 1)}(\beta) \geq F^{-1}_{N(0, 1)}(1-\alpha) + \frac{\sqrt n(\mu_0 - \mu_1)}{\sigma_0} \]

ΠŸΡƒΡΡ‚ΡŒ \(\mu_1 - \mu_0 = \varepsilon\) - ΠΎΠΆΠΈΠ΄Π°Π΅ΠΌΡ‹ΠΉ эффСкт Π² Π°Π±ΡΠΎΠ»ΡŽΡ‚Π½Ρ‹Ρ… Π²Π΅Π»ΠΈΡ‡ΠΈΠ½Π°Ρ….

\[ F^{-1}_{N(0, 1)}(\beta) \geq F^{-1}_{N(0, 1)}(1-\alpha) + \frac{\sqrt n\varepsilon}{\sigma_0} \]

Π’Ρ‹Ρ€Π°Π·ΠΈΠΌ \(\varepsilon\):

\[ \varepsilon \geq \left[F^{-1}_{N(0, 1)}(1-\alpha) - F^{-1}_{N(0, 1)}(\beta)\right] \cdot \frac{\sigma_0}{{\sqrt n}} \]

ПокаТСм, Ρ‡Ρ‚ΠΎ \(F^{-1}_{N(0, 1)}(\beta) = - F^{-1}_{N(0, 1)}(1-\beta)\)

ΠŸΡƒΡΡ‚ΡŒ \(F_{N(0, 1)}(x) = \beta\). Π˜Π·Π²Π΅ΡΡ‚Π½ΠΎ, Ρ‡Ρ‚ΠΎ \(F_{N(0, 1)}(x) + F_{N(0, 1)}(-x) = 1\), Ρ‚ΠΎΠ³Π΄Π° \(F_{N(0, 1)}(-x) = 1 - F_{N(0, 1)}(x) = 1 - \beta\) \(x = F_{N(0, 1)}(1-\beta)\) Π‘ Π΄Ρ€ΡƒΠ³ΠΎΠΉ стороны: \(x = F_{N(0, 1)}(\beta)\) ΠŸΠΎΠ΄ΡΡ‚Π°Π²ΠΈΠ², ΠΏΠΎΠ»ΡƒΡ‡ΠΈΠΌ Π΄ΠΎΠΊΠ°Π·Ρ‹Π²Π°Π΅ΠΌΠΎΠ΅ равСнство

\[ \varepsilon \geq \left[F^{-1}_{N(0, 1)}(1-\alpha) + F^{-1}_{N(0, 1)}(1-\beta)\right] \cdot \frac{\sigma_0}{{\sqrt n}} \]

ΠŸΠΎΠΊΠ°Π·Ρ‹Π²Π°Π΅Ρ‚, ΠΊΠ°ΠΊΠΎΠΉ Ρ€Π°Π·ΠΌΠ΅Ρ€ эффСкта \(\varepsilon\) ΠΌΡ‹ способны ΠΎΠ±Π½Π°Ρ€ΡƒΠΆΠΈΡ‚ΡŒ ΠΏΡ€ΠΈ Π·Π°Π΄Π°Π½Π½Ρ‹Ρ… \(\alpha\) ΠΈ \(\beta\), Π²Ρ‹Π±ΠΎΡ€ΠΊΠΈ Ρ€Π°Π·ΠΌΠ΅Ρ€Π° \(n\) ΠΈ диспСрсиями \(\sigma_0\).

РСзюмС

MDE - ΠΌΠΈΠ½ΠΈΠΌΠ°Π»ΡŒΠ½Ρ‹ΠΉ эффСкт, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ ΠΌΠΎΠΆΠ΅ΠΌ ΠΏΠΎΠΉΠΌΠ°Ρ‚ΡŒ.

  • \(\varepsilon\) - Ρ€Π°Π·ΠΌΠ΅Ρ€ эффСкта (Π² Π°Π±ΡΠΎΠ»ΡŽΡ‚Π½Ρ‹Ρ… Π²Π΅Π»ΠΈΡ‡ΠΈΠ½Π°Ρ…)

  • \(\alpha\) - допустимая ошибка 1 Ρ€ΠΎΠ΄Π°

  • \(\beta\) - допустимая ошибка 2 Ρ€ΠΎΠ΄Π°

  • \(\sigma^2_x, \sigma^2_y\) - диспСрсии Π²Ρ‹Π±ΠΎΡ€ΠΎΠΊ

  • \(n\) - Ρ€Π°Π·ΠΌΠ΅Ρ€ Π²Ρ‹Π±ΠΎΡ€ΠΊΠΈ

\[ \varepsilon^2 \geq \left[F^{-1}_{N(0, 1)}(1-\alpha) + F^{-1}_{N(0, 1)}(1-\beta)\right]^2 \cdot \frac{(\sigma^2_x+ \sigma^2_y)}{n} \]
\[ \varepsilon \geq \left[F^{-1}_{N(0, 1)}(1-\alpha) + F^{-1}_{N(0, 1)}(1-\beta)\right] \cdot \sqrt{\frac{\sigma^2_x}{n_x}+ \frac{\sigma^2_y}{n_y}} \]

ΠžΡ†Π΅Π½ΠΈΠΌ Ρ€Π°Π·ΠΌΠ΅Ρ€ Π²Ρ‹Π±ΠΎΡ€ΠΊΠΈ, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ Π½Π΅ΠΎΠ±Ρ…ΠΎΠ΄ΠΈΠΌ, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΎΠ±Π½Π°Ρ€ΡƒΠΆΠΈΡ‚ΡŒ ΠΎΠΆΠΈΠ΄Π°Π΅ΠΌΡ‹ΠΉ эффСкт ΠΏΡ€ΠΈ фиксированных ΠΎΡˆΠΈΠ±ΠΊΠ°Ρ… ΠΏΠ΅Ρ€Π²ΠΎΠ³ΠΎ ΠΈ Π²Ρ‚ΠΎΡ€ΠΎΠ³ΠΎ Ρ€ΠΎΠ΄Π°:

\[ n \geq \left[F^{-1}_{N(0, 1)}(1-\alpha) + F^{-1}_{N(0, 1)}(1-\beta)\right]^2 \frac{\sigma^2_x + \sigma^2_y}{\varepsilon^2} \]

По сути, Π²Π»ΠΈΡΡ‚ΡŒ ΠΌΡ‹ ΠΌΠΎΠΆΠ΅ΠΌ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Π½Π° диспСрсии. А ΠΎΠ± этом дальшС.

import math
from IPython.display import display

import numpy as np
from scipy import stats
import pandas as pd
class Mde:
    
    @classmethod
    def get_diff(
        cls, 
        first_type_error: float, 
        second_type_error: float, 
        n_x: int, 
        std_x: float, 
        n_y: int = None, 
        std_y: float = None, 
        two_sided: bool = True
    ) -> float:
        if std_y is None:
            std_y = std_x
        if n_y is None:
            n_y = n_x

        f_alpha = stats.norm(0, 1).ppf(1-first_type_error/(1+two_sided))
        f_beta = stats.norm(0, 1).ppf(1-second_type_error)
        se = math.sqrt(std_x**2 / n_x + std_y**2/n_y)
        
        return (f_alpha + f_beta) * se
    
    @classmethod
    def get_sample_size_abs(
        cls, 
        effect: float, 
        first_type_error: float, 
        second_type_error: float, 
        std_x: float, 
        std_y: float = None, 
        two_sided: bool = True
    ) -> int:
        if std_y is None:
            std_y = std_x

        f_alpha = stats.norm(0, 1).ppf(1-first_type_error/(1+two_sided))
        f_beta = stats.norm(0, 1).ppf(1-second_type_error)

        norm_std = (std_x**2 + std_y**2) / effect**2
        return math.ceil((f_alpha + f_beta)**2 * norm_std)

    @classmethod
    def get_sample_size_arb(
        cls, 
        mu: float,
        effect_percantage: float, 
        first_type_error: float, 
        second_type_error: float, 
        std_x: float, 
        std_y: float = None, 
        two_sided: bool = True
    ) -> int:
        new_mu = mu * (1 + effect_percantage)
        return cls.get_sample_size_abs(new_mu - mu, first_type_error, second_type_error, std_x, std_y, two_sided)
    
    @classmethod
    def get_sample_size_table(
        cls, 
        mu: float,
        std_x: float, 
        effect_percantage: list[int] = [0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1], 
        first_type_error: float = [0.01, 0.03, 0.05, 0.1], 
        second_type_error: float = [0.05, 0.1, 0.15, 0.2, 0.25], 
        std_y: float = None, 
        two_sided: bool = True
    ) -> None:
        
        multi_index = []
        values = []
        for alpha in first_type_error:
            for effect in effect_percantage:
                betas = []
                multi_index.append((alpha, f'{int(effect*100)}%'))
                for beta in second_type_error:
                    betas.append(
                        cls.get_sample_size_arb(
                            mu=mu,
                            effect_percantage=effect,
                            first_type_error=alpha,
                            second_type_error=beta,
                            std_x=std_x,
                            std_y=std_y,
                            two_sided=two_sided
                        )
                    )
                values.append(betas)
        multi_index = pd.MultiIndex.from_tuples(multi_index, names=['alpha', 'effect'])
        
        with pd.option_context('display.max_rows', None):
            display(pd.DataFrame(
                values, 
                index=multi_index, 
                columns=[f'beta: {second_type_error[0]}'] + second_type_error[1:])
            )
        
        return None
        
Mde.get_sample_size_table(10, 1)
beta: 0.05 0.1 0.15 0.2 0.25
alpha effect
0.01 1% 3563 2976 2610 2336 2113
2% 891 744 653 584 529
3% 396 331 290 260 235
4% 223 186 164 146 133
5% 143 120 105 94 85
6% 99 83 73 65 59
7% 73 61 54 48 44
8% 56 47 41 37 34
9% 44 37 33 29 27
10% 36 30 27 24 22
0.03 1% 2911 2383 2057 1815 1619
2% 728 596 515 454 405
3% 324 265 229 202 180
4% 182 149 129 114 102
5% 117 96 83 73 65
6% 81 67 58 51 45
7% 60 49 42 38 34
8% 46 38 33 29 26
9% 36 30 26 23 20
10% 30 24 21 19 17
0.05 1% 2599 2102 1796 1570 1389
2% 650 526 449 393 348
3% 289 234 200 175 155
4% 163 132 113 99 87
5% 104 85 72 63 56
6% 73 59 50 44 39
7% 54 43 37 33 29
8% 41 33 29 25 22
9% 33 26 23 20 18
10% 26 22 18 16 14
0.10 1% 2165 1713 1438 1237 1076
2% 542 429 360 310 269
3% 241 191 160 138 120
4% 136 108 90 78 68
5% 87 69 58 50 44
6% 61 48 40 35 30
7% 45 35 30 26 22
8% 34 27 23 20 17
9% 27 22 18 16 14
10% 22 18 15 13 11