오픈소스 기여하기 (오픈소스 컨트리뷰터

개요

오픈소스 컨트리뷰션 아카데미를 통해 Exporterhub.io 오픈소스 기여해볼 수 있는 기회를 가져봤다.

해당 오픈소스 이슈68번 Front의 각 Exporter들의 어떤 Contents가 저장되어있는지 확인하기 위해 Backend에서 각 Exporter 들의 Contents List data를 만들어 Front에게 보내는 작업을 해보기로했다.

개발환경 세팅

처음에 개발하기 위해 개발환경세팅을 해야했는데, 해당 프로젝트는 총 4개의 컨테이너가 동시에 띄워져 앱을 실행시키는 오픈소스였다. 그때 당시 나는 Docker 에 대해 단일 컨테이너를 띄우는 것만해보아서 어떻게 다중 컨테이너를 띄우는 상태에서 개발환경 세팅을 어떻게 해야될지 몰랐다 ㅜ..

그래서 인프런 강좌에있는 따라하며 배우는 도커와 CI환경 을 결재해서 Docker 와 Docker-Compose를 공부하였다.

부랴부랴 공부해서 삽질하기를 2주째.. 겨우겨우 로컬과 마운트해서 개발환경세팅해서 실행시켜보았는데, 그저 해당 오픈소스의 Install Guide를 잘 보고 따라하고, docker-compose.yml 파일내에서 개발할 컨테이너를 로컬과 마운트시켜 docker-compose up으로 실행시키면 되었다.

docker-compose.yml

version: "3.1"
services:
  
  api:
    volumes:
      - ./api:/data

Exporter Contents List Data

메트릭을 추출해주는 각 Exporter 들은

Hands on (Default)
IaC
Alerting rule
Grafana dashboard

의 Content가 있는데 Hands on 해당 Exporter 의 기본정보로 Exporter 등록할때 DB에 저장되게 된다. 이외의 컨텐츠들은 Github repo를 DB처럼 이용하여 저장하여 이 데이터가 있는지 확인하기 위해서 GithubAPI를 이용하여 매번 요청과 응답의 과정을 거친다.

그래서 나는 Github API를 통해 메인페이지를 로딩할 시 각 Exporter들의 Contents 여부를 확인하기 위해 Github API를 이용하여 데이터를 가공하여 첫번째 Commit을 날려보았다.

`get_exp_contents` 함수 추가

def get_exp_contents(self, user):
    github_token = user.github_token if user else Token.objects.last().token
    headers      = {'Authorization' : 'token ' + github_token}
    repo         = f"{settings.ORGANIZATION}/exporterhub.io"
    exp_lst      = requests.get(f"https://api.github.com/repos/{repo}/contents/contents/", headers=headers)
    git_exp_list = exp_lst.json()

    content_type = { 'helm' : 'I', 'alert' : 'A', 'dashboard' : 'G' }
    exporter_content = {}

    for exp in git_exp_list:
        app_name = exp['name']
        if app_name != 'README.md':
            exporter_content[app_name] = {'E' : True, 'I' : False, 'A' : False, 'G' : False}
            response = requests.get(exp["git_url"], headers=headers)
            exp_data = response.json()

            for exp in exp_data['tree'][::2]:
                content = exp['path'].split("_")[-1].split(".")[0].strip()
                if content in content_type:
                    exporter_content[app_name][content] = True

    return exporter_content

레이턴시 이슈 발생

필요한 데이터를 Github API를 이용하여 만들었지만, 각 Exporter 객체 모두를 각각 요청과 응답을 반복하여 데이터 생성시 6초간 딜레이가 생겨 레이턴시 이슈가 생겨버렸다 ㅜ..

캐싱을 이용한 성능개선

위의 레이턴시 이슈를 해결하기 위해 처음에 비동기처리를 이용하여 해결해보려 했으나, 해당 오픈소스의 멘토님인 랄프님과 이야기해보고 같이 참여하고 있는 기여자들과 함께 논의를 해보고 Caching을 이용하여 Data를 저장하고 Scheduler를 이용하여 주기적으로 데이터를 캐싱하여 성능개선을 도모해보기로 했다.

처음에는 Django Caching 중 하나인 Memcaching을 이용해보려 했으나 해당 Memcaching은 캐쉬서버를 띄워서 사용해야하고 여기서는 scheduler 컨테이너에서 데이터를 캐싱하고 api 컨테이너에서 캐싱된 데이터를 쓰기 때문에 캐싱된 데이터를 쓰이는 것이 여의치 않아 Django의 File Caching 시스템을 이용하여 데이터를 캐싱하여 파일로 저장하는 방법을 택했다.

views.py

ExporterView에서 캐싱된 데이터를 가져온다.
```
exporter_content = cache.get('exporter_content')
```

Exporter Content가 업데이트시 캐싱된 데이터를 업데이트 하도록 했다.

content_type = { 'helm' : 'I', 'alert' : 'A', 'dashboard' : 'G' }.get(
    code_file_name.split("_")[-1].split(".")[0].strip()
)
exporter_content = cache.get('exporter_content')
exporter_content[app_name][content_type] = True
cache.set('exporter_content', exporter_content, 60 * 61)

settings.py

File Caching System을 사용하기 위한 Cache Setting

# Django File Caching
CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache',
        'LOCATION': 'exporter/django_cache',
    }
}

scheduler.py

1시간마다 익스포터 컨텐츠 데이터를 GithubAPI를 통해 조회 하여 파일캐싱

@db_auto_reconnect
def get_exporter_contents():
    """# Check the existence of the content of the exporters."""
    github_token = Token.objects.last().token
    headers      = {'Authorization' : 'token ' + github_token}
    repo         = f"{settings.ORGANIZATION}/exporterhub.io"
    exp_lst      = requests.get(f"https://api.github.com/repos/{repo}/contents/contents/", headers=headers)
    git_exp_list = exp_lst.json()
    content_type = { 'helm' : 'I', 'alert' : 'A', 'dashboard' : 'G' }
    exporter_content = {}

    for exp in git_exp_list:
        app_name = exp['name']
        if app_name != 'README.md':
            exporter_content[app_name] = {'I' : False, 'A' : False, 'G' : False}
            response = requests.get(exp["git_url"], headers=headers)
            exp_data = response.json()

            for exp in exp_data['tree'][::2]:
                content = exp['path'].split("_")[-1].split(".")[0].strip()
                if content in content_type:
                    exporter_content[app_name][content_type[content]] = True

    cache.set('exporter_content', exporter_content, 60 * 61)

1시간마다 데이터를 캐싱하기위해 APScheduler 이용한 캐싱 스케쥴

# Periodically caching data of the contents of the exporters.
scheduler.add_job(
    get_exporter_contents,
    trigger          = CronTrigger(hour='*/1'),
    id               = 'get_exporter_contents',
    max_instances    = 1,
    replace_existing = True,
    misfire_grace_time = 120,
    next_run_time    = datetime.now()
)
logger.info("Added job 'get_exporter_contents'")

캐싱된 데이터를 가져오지 못해서 또한번 난관..

파일 캐싱을 이용하여 주기적으로 파일 캐싱하여 데이터를 캐싱하도록 했으나 문제는 Exporterhub.io 오픈소스는 총 4개의 컨테이너 (Front, API, DB, Scheduler) 띄워 실행하는데 나는 스케쥴러 컨테이너에서 파일을 캐싱하여 따로 독립된 API 컨테이너에서는 캐싱된 데이터를 쓰지 못하는 것이었다.

그래서 나는 간단하게 API 컨테이너와 스케쥴러 컨테이너를 로컬과 마운트시켜 스케쥴러에서 데이터를 캐싱하여 파일로만들어 로컬에 저장되고 로컬에 저장된 캐싱파일은 API 컨테이너와 마운트되어 캐싱된 데이터를 사용할 수 있게 만들었다.

docker-compose를 이용하여 로컬과 마운트

# docker-compose.yml
version: "3.1"
services:
    scheduler:
        volumes:
            - ./api/exporter/django_cache:/data/exporter/django_cache
    api:
        volumes:
            - ./api/exporter/django_cache:/data/exporter/django_cache

머지되었으나 롤백되어진 PR

https://github.com/NexClipper/exporterhub.io/pull/90
해당 이슈에서 백엔드 데이터를 만들고 파일캐싱을 이용하여 성능개선하여 PR을 날려 머지까지 되었엇으나 배포 중 에러가 생겼다.

이유는

해당 오픈소스의 특성상 Github API Token 를 이용해서 데이터를 가져오게 되어있다. 그래서 사용자가 토큰을 입력하기 전에는 데이터를 가져오지 못해 전체 Exporter 객체가 None으로 떠 화면이 보이지 않는 것 같다(추정).

그리하여 프론트와 좀더 논의를 해봐야될 것 같다. 휴…

개요
개발환경 세팅
- docker-compose.yml
Exporter Contents List Data
- get_exp_contents 함수 추가
캐싱을 이용한 성능개선
머지되었으나 롤백되어진 PR

개요