Bucket4j Rate Limiting 분석

대규모 마이크로서비스 환경에서 안정적인 Rate Limiting을 구현하기 위해 Bucket4j를 도입했습니다. 단순한 토큰 버킷을 넘어 분산 시스템에서의 복잡한 동시성 제어와 성능 최적화까지, 프로덕션 환경에서 겪은 실전 경험을 공유합니다.

1. Bucket4j 기반 Token Bucket 구현 분석

Bucket4j는 Vladimir Bukhtoyarov이 개발한 Java용 Rate Limiting 라이브러리로, Token Bucket 알고리즘을 절대적 정밀도(absolutely non-compromise precision)와 lock-free 멀티스레딩으로 구현했습니다.

1.1 Bucket4j의 Token Bucket 구현 특징

공식 문서에 따르면 Bucket4j는 다음과 같은 고유한 특징을 제공합니다:

정수 연산 정밀도: 부동소수점 연산 오차 없이 정확한 토큰 계산
멀티 대역폭 지원: 하나의 버킷에서 여러 제한 정책 동시 적용
Garbage Collection 최적화: 원시 타입 사용으로 메모리 할당 최소화

1.1.1 Bucket4j의 핵심 API 구조

Bucket4j Builder API는 직관적인 fluent interface를 제공합니다:

// 기본 로컬 버킷 생성
Bucket bucket = Bucket.builder()
    .addLimit(limit -> limit.capacity(50).refillGreedy(10, Duration.ofSeconds(1)))
    .build();

// 다중 대역폭 버킷
Bucket bucket = Bucket.builder()
    .addLimit(limit -> limit.capacity(50).refillGreedy(10, Duration.ofSeconds(1)))  // 초당 제한
    .addLimit(limit -> limit.capacity(1000).refillGreedy(100, Duration.ofMinutes(1))) // 분당 제한
    .build();

1.1.2 수학적 모델

토큰 충전 공식:

new_tokens = min(capacity, current_tokens + (current_time - last_refill_time) × refill_rate)

요청 허용 조건:

request_allowed = (required_tokens ≤ current_tokens)

1.1.3 알고리즘의 특징

Burst Handling: 짧은 시간 내 트래픽 급증을 버킷 용량 범위에서 허용
Smooth Rate Control: 장기적으로는 설정된 속도로 트래픽 제한
Memory Efficiency: 각 사용자당 단일 상태만 유지

1.2 Bucket4j Refill 전략 분석

공식 문서에서 제공하는 세 가지 토큰 리필 전략:

1.2.1 Greedy Refill (기본값)

가장 일반적인 방식으로, 토큰이 소비되는 순간 즉시 리필을 계산합니다:

Bucket bucket = Bucket.builder()
    .addLimit(limit -> limit
        .capacity(100)
        .refillGreedy(10, Duration.ofSeconds(1))  // 초당 10개 토큰 즉시 충전
    )
    .build();

1.2.2 Intervally Refill

고정된 시간 간격으로 배치 단위로 토큰을 충전합니다:

Bucket bucket = Bucket.builder()
    .addLimit(limit -> limit
        .capacity(100)
        .refillIntervally(10, Duration.ofSeconds(1))  // 정확히 1초마다 10개 토큰 충전
    )
    .build();

1.2.3 Bucket4j의 고급 기능들

Bucket4j 고급 API:

기능	설명	사용 사례
Verbose API	상세한 진단 정보 제공	디버깅, 모니터링
Listener API	토큰 소비 이벤트 감지	로깅, 메트릭 수집
Async API	비동기 토큰 소비	논블로킹 애플리케이션
Configuration Replacement	런타임 설정 변경	동적 Rate Limit 조정

1.3 Bucket4j의 내부 구현 세부사항

GitHub 저장소에서 확인할 수 있는 Bucket4j의 고성능 구현 특징:

1.3.1 정수 연산 기반 정밀도

Bucket4j는 “absolutely non-compromise precision“을 위해 모든 계산을 64비트 정수로 수행:

// io.github.bucket4j.local.LocalBucketState 클래스에서
private void refill(int bandwidthIndex, Bandwidth bandwidth, long currentTimeNanos) {
    // 나노초 정밀도로 시간 계산
    long durationSinceLastRefillNanos = currentTimeNanos - previousRefillNanos;
    
    // 정수 연산으로 토큰 수 계산 (부동소수점 오차 없음)
    long tokensToAdd = Math.multiplyExact(durationSinceLastRefillNanos, refillTokens) / refillPeriodNanos;
}

1.3.2 Lock-free 멀티스레딩 최적화

로컬 버킷의 경우 CAS(Compare-And-Swap) 연산을 사용하여 lock-free 동시성 제어:

// 원자적 상태 업데이트
public boolean tryConsumeAsMuchAsPossible(long limit) {
    while (true) {
        long currentState = stateRef.get();
        // ... 상태 계산 로직
        if (stateRef.compareAndSet(currentState, newState)) {
            return true;
        }
        // CAS 실패 시 재시도
    }
}

1.3.3 Garbage Collection 최적화

README에서 명시된 바와 같이 원시 타입 사용으로 GC 오버헤드 최소화:

// 객체 할당 대신 long 배열로 상태 관리
private final AtomicLongArray stateArray;

// 메모리 사용량: 8 bytes × 상태 필드 수
private static final int BANDWIDTH_SIZE = 4; // capacity, availableTokens, lastRefillTime, roundingError

2. Bucket4j 핵심 구현 분석

2.1 토큰 소비 메커니즘 (Token Consumption)

토큰 소비 요청 시 Bucket4j는 원자적 연산을 통해 다음 단계를 수행합니다:

나노초 정밀도로 현재 시간 측정
마지막 리필 시점부터 경과된 시간을 기반으로 토큰 리필
요청된 토큰 수와 사용 가능한 토큰 수 비교
토큰 소비 또는 거부 결정

@Override
protected boolean tryConsumeImpl(long tokensToConsume) {
  long currentTimeNanos = timeMeter.currentTimeNanos();
  lock.lock();
  try {
      state.refillAllBandwidth(currentTimeNanos);
      long availableToConsume = state.getAvailableTokens();
      if (tokensToConsume > availableToConsume) {
          return false;
      }
      state.consume(tokensToConsume);
      return true;
  } finally {
      lock.unlock();
  }
}

2.2 토큰 리필 전략 (Refill Strategies)

Bucket4j는 다양한 비즈니스 요구사항에 맞는 토큰 리필 전략을 제공합니다:

Greedy Refill: 즉시 토큰 충전 (기본값)
Intervally Refill: 고정 간격으로 배치 충전
Smooth Refill: 균등한 속도로 연속 충전

public void refillAllBandwidth(long currentTimeNanos) {
    Bandwidth[] bandwidths = configuration.getBandwidths();
    for (int i = 0; i < bandwidths.length; i++) {
        refill(i, bandwidths[i], currentTimeNanos);
    }
}

private void refill(int bandwidthIndex, Bandwidth bandwidth, long currentTimeNanos) {
    
    long previousRefillNanos = getLastRefillTimeNanos(bandwidthIndex);
    if (currentTimeNanos <= previousRefillNanos) {
        return;
    }

    // 주기적 리필 방식인경우 
    if (bandwidth.isRefillIntervally()) {
        // currentTimeNanos에 리필이 되어야하는 마지막 나노초까지의 타임스탬프를 지정
        long incompleteIntervalCorrection = (currentTimeNanos - previousRefillNanos) % bandwidth.getRefillPeriodNanos();
        currentTimeNanos -= incompleteIntervalCorrection;
    }
    if (currentTimeNanos <= previousRefillNanos) {
        return;
    } else {
        setLastRefillTimeNanos(bandwidthIndex, currentTimeNanos);
    }

    final long capacity = bandwidth.getCapacity();
    final long refillPeriodNanos = bandwidth.getRefillPeriodNanos();
    final long refillTokens = bandwidth.getRefillTokens();
    final long currentSize = getCurrentSize(bandwidthIndex);

    // 버킷이 가득찼는지 검사
    if (currentSize >= capacity) {
        // can come here if forceAddTokens has been used
        return;
    }

    long durationSinceLastRefillNanos = currentTimeNanos - previousRefillNanos;
    long newSize = currentSize;

    // 마지막 리필시간으로부터 현재 시간까지의 기간이 리필되어야하는 기간보다 큰경우, 즉 리필이 수행되어야하는경우
    if (durationSinceLastRefillNanos > refillPeriodNanos) {
        long elapsedPeriods = durationSinceLastRefillNanos / refillPeriodNanos;
        long calculatedRefill = elapsedPeriods * refillTokens;
        // 기존 버킷의 토큰수 + 리필되어야하는 토큰수
        newSize += calculatedRefill;
        // 리필후 토큰사이즈가 버킷의 토큰 용량보다 큰경우 용량 사이즈로 리필하고 종료
        if (newSize > capacity) {
            resetBandwidth(bandwidthIndex, capacity);
            return;
        }
        // 리필후 토큰사이즈가 현재 사이즈보다 작은경우 오버플로가 발생한것으로 판단하고 버킷 용량 사이즈로 리필하고 종료
        if (newSize < currentSize) {
            // arithmetic overflow happens. This mean that tokens reached Long.MAX_VALUE tokens.
            // just reset bandwidth state
            resetBandwidth(bandwidthIndex, capacity);
            return;
        }
        durationSinceLastRefillNanos %= refillPeriodNanos;
    }


    long roundingError = getRoundingError(bandwidthIndex);
    long dividedWithoutError = multiplyExactOrReturnMaxValue(refillTokens, durationSinceLastRefillNanos);
    long divided = dividedWithoutError + roundingError;
    if (divided < 0 || dividedWithoutError == Long.MAX_VALUE) {
        // arithmetic overflow happens.
        // there is no sense to stay in integer arithmetic when having deal with so big numbers
        long calculatedRefill = (long) ((double) durationSinceLastRefillNanos / (double) refillPeriodNanos * (double) refillTokens);
        newSize += calculatedRefill;
        roundingError = 0;
    } else {
        // 리필되어야하는 토큰수
        long calculatedRefill = divided / refillPeriodNanos;
        if (calculatedRefill == 0) {
            roundingError = divided;
        } else {
            newSize += calculatedRefill;
            roundingError = divided % refillPeriodNanos;
        }
    }

    if (newSize >= capacity) {
        resetBandwidth(bandwidthIndex, capacity);
        return;
    }
    if (newSize < currentSize) {
        // arithmetic overflow happens. This mean that bucket reached Long.MAX_VALUE tokens.
        // just reset bandwidth state
        resetBandwidth(bandwidthIndex, capacity);
        return;
    }
    
    // 새로계산된 토큰수로 버킷 토큰수 설정
    setCurrentSize(bandwidthIndex, newSize);
    setRoundingError(bandwidthIndex, roundingError);
}

3. 분산 환경에서의 Bucket4j 아키텍처

3.1 분산 캐시 통합 전략

대규모 마이크로서비스 환경에서 Bucket4j는 단일 인스턴스의 메모리 기반 제한을 넘어 Redis, Hazelcast, Coherence 등의 분산 캐시와 통합하여 글로벌 Rate Limiting을 구현합니다.

@Configuration
public class DistributedRateLimitConfig {
    
    @Bean
    public ProxyManager<String> bucketProxyManager(RedissonClient redissonClient) {
        return Bucket4j.extension(RedissonProxyManager.class)
            .builder()
            .withExpirationAfterWriteStrategy(ExpirationAfterWriteStrategy.basedOnTimeForRefillingBucketUpToMax(
                Duration.ofMinutes(10))) // TTL 최적화
            .build(redissonClient);
    }
    
    @Component
    public class DistributedRateLimiter {
        private final ProxyManager<String> buckets;
        
        public boolean isAllowed(String userId, long tokens) {
            Bucket bucket = buckets.builder()
                .addLimit(Bandwidth.classic(100, Refill.intervally(100, Duration.ofMinutes(1))))
                .build(userId, () -> createNewBucket());
                
            return bucket.tryConsume(tokens);
        }
    }
}

3.2 분산 동기화 메커니즘

분산 환경에서 rate limit 상태 동기화는 복잡한 동시성 제어 문제를 야기합니다. Bucket4j의 분산 구현은 다음과 같은 전략을 사용합니다:

3.2.1 Compare-and-Swap (CAS) 기반 낙관적 잠금

public class RedisBasedBucketState {
    
    private boolean updateStateWithCAS(String key, BucketState expectedState, BucketState newState) {
        String script = """
            local current = redis.call('GET', KEYS[1])
            if current == ARGV[1] then
                redis.call('SET', KEYS[1], ARGV[2])
                return 1
            else
                return 0
            end
            """;
            
        Long result = jedis.eval(script, 
            Collections.singletonList(key),
            Arrays.asList(serialize(expectedState), serialize(newState)));
            
        return result == 1;
    }
}

3.2.2 수학적 정확성을 위한 시간 동기화

분산 환경에서 가장 중요한 것은 시간 동기화입니다. 각 노드의 시계 차이 δ가 있을 때, refill 계산의 정확도는 다음과 같이 영향받습니다:

실제 토큰 수 = min(capacity, current_tokens + (time_diff - δ) × refill_rate)

네트워크 지연 λ와 시계 차이 δ를 고려한 최악의 경우 오차는:

최대 오차 = (λ + 2δ) × refill_rate

3.3 분산 처리 성능 분석

3.3.1 네트워크 오버헤드 계산

분산 Rate Limiting의 성능은 네트워크 RTT(Round Trip Time)에 크게 의존합니다. Redis의 성능 특성과 네트워크 지연 최적화를 고려해야 합니다.

public class PerformanceMetrics {
    
    // 단일 요청의 총 레이턴시
    // Total Latency = Network RTT + Redis Processing + Serialization
    public double calculateTotalLatency(double networkRtt, double redisProcessing, double serialization) {
        return networkRtt + redisProcessing + serialization;
    }
    
    // 초당 처리 가능한 최대 요청 수
    public double maxThroughput(double totalLatency, int connectionPoolSize) {
        return connectionPoolSize / totalLatency;
    }
}

일반적인 환경에서의 성능 벤치마크:

Local Cache: ~100,000 ops/sec
Redis (same DC): ~10,000 ops/sec
Redis (cross-region): ~1,000 ops/sec

3.3.2 메모리 사용량 최적화

분산 환경에서 각 bucket의 메모리 사용량을 최적화하는 것이 중요합니다:

public class OptimizedBucketState {
    // 기본 상태: 96 bytes (8개 long 필드)
    private long[] bandwidthStates = new long[8];
    
    // 압축된 상태: ~30% 메모리 절약
    public byte[] compressState() {
        ByteBuffer buffer = ByteBuffer.allocate(64);
        for (long state : bandwidthStates) {
            buffer.putLong(state);
        }
        return compress(buffer.array());
    }
}

3.4 실전 분산 환경 설계 고려사항

3.4.1 Hot Partition 문제 해결

특정 사용자나 API에 트래픽이 집중될 때 발생하는 hot partition 문제를 해결하기 위한 Consistent Hashing 기반 샤딩 전략:

@Component
public class ShardedRateLimiter {
    private static final int SHARD_COUNT = 10;
    
    public boolean isAllowed(String userId, long tokens) {
        // 사용자별로 여러 샤드에 분산하여 부하 분산
        List<CompletableFuture<Boolean>> futures = new ArrayList<>();
        long tokensPerShard = tokens / SHARD_COUNT;
        long remainingTokens = tokens % SHARD_COUNT;
        
        for (int i = 0; i < SHARD_COUNT; i++) {
            String shardKey = userId + ":shard:" + i;
            long shardTokens = tokensPerShard + (i < remainingTokens ? 1 : 0);
            
            futures.add(CompletableFuture.supplyAsync(() -> 
                consumeFromShard(shardKey, shardTokens)));
        }
        
        return futures.stream()
            .allMatch(future -> {
                try {
                    return future.get(100, TimeUnit.MILLISECONDS);
                } catch (Exception e) {
                    return false; // Fail-safe
                }
            });
    }
}

3.4.2 Circuit Breaker 패턴 적용

분산 캐시 장애 시 Circuit Breaker 패턴을 활용한 resilient fallback 전략:

@Component
public class ResilientRateLimiter {
    private final CircuitBreaker circuitBreaker;
    private final LocalRateLimiter fallbackLimiter;
    
    public boolean isAllowed(String userId, long tokens) {
        return circuitBreaker.executeSupplier(() -> {
            // 분산 rate limiter 시도
            return distributedLimiter.isAllowed(userId, tokens);
        }).recover(throwable -> {
            // 장애 시 로컬 rate limiter로 fallback
            log.warn("Distributed rate limiter failed, using local fallback", throwable);
            return fallbackLimiter.isAllowed(userId, tokens);
        });
    }
}

3.4.3 멀티 리전 배포 전략

글로벌 서비스를 위한 지리적 분산 기반 멀티 리전 Rate Limiting:

public class GlobalRateLimiter {
    
    // 지역별 제한: 전체 제한의 60%, 글로벌 예비: 40%
    private static final double REGIONAL_QUOTA_RATIO = 0.6;
    private static final double GLOBAL_RESERVE_RATIO = 0.4;
    
    public boolean isAllowed(String userId, String region, long tokens) {
        // 1. 지역별 quota 먼저 확인
        String regionalKey = String.format("%s:region:%s", userId, region);
        long regionalLimit = (long) (totalLimit * REGIONAL_QUOTA_RATIO);
        
        if (consumeFromBucket(regionalKey, tokens, regionalLimit)) {
            return true;
        }
        
        // 2. 글로벌 예비 quota 확인
        String globalKey = String.format("%s:global", userId);
        long globalReserve = (long) (totalLimit * GLOBAL_RESERVE_RATIO);
        
        return consumeFromBucket(globalKey, tokens, globalReserve);
    }
}

3.5 성능 모니터링 및 알러팅

분산 Rate Limiting 시스템의 건강성을 모니터링하기 위한 Micrometer 기반 메트릭 수집:

@Component
public class RateLimitMetrics {
    private final MeterRegistry meterRegistry;
    
    public void recordRateLimitDecision(String userId, boolean allowed, double latency) {
        Timer.Sample sample = Timer.start(meterRegistry);
        sample.stop(Timer.builder("rate_limit_check")
            .tag("user", userId)
            .tag("allowed", String.valueOf(allowed))
            .register(meterRegistry));
            
        meterRegistry.counter("rate_limit_decisions",
            "result", allowed ? "allowed" : "denied")
            .increment();
    }
}

분산 환경에서의 Bucket4j는 단순한 토큰 버킷을 넘어 복잡한 분산 시스템의 동시성 제어, 네트워크 최적화, 장애 복구 등 다양한 엔지니어링 과제를 해결해야 합니다. 이러한 고려사항들을 통해 대규모 프로덕션 환경에서도 안정적이고 정확한 rate limiting을 구현할 수 있습니다.