SparklingSoDA4.0은 partial.Dockerfile들을 모아서 temp.Dockerfile에 작성 후 docker api를 이용하여 build함.
어떤 partial.Dockerfile을 사용할 지는 spsd배포서버(192.168.100.129)에 가서 확인하면 됨
pwd /data/data04/docker_image_builder/partials ll total 204 -rw-rw-r-- 1 aiadmin aiadmin 2073 May 16 2023 chip.partial.Dockerfile drwxrwxr-x 2 aiadmin aiadmin 144 Jun 10 14:18 cuda -rw-rw-r-- 1 aiadmin aiadmin 230 May 13 15:14 cuda_116_ubuntu2004.partial.Dockerfile -rw-rw-r-- 1 aiadmin aiadmin 229 May 16 2023 cuda_ubuntu.partial.Dockerfile -rw-rw-r-- 1 aiadmin aiadmin 2187 Jun 7 2023 jupyter.partial.Dockerfile -rw-rw-r-- 1 aiadmin aiadmin 624 May 16 2023 nexus_repo_bionic_release.partial.Dockerfile -rw-rw-r-- 1 aiadmin aiadmin 623 May 16 2023 nexus_repo_focal_release.partial.Dockerfile -rw-rw-r-- 1 aiadmin aiadmin 195 May 16 2023 nexus_repo_online.partial.Dockerfile
사용할 partial.Dockerfile을 확인했으면 spec.yml의 slice_sets에 아래와 같이 적어주면 됨
slice_sets: python_311: #나중에 tag_specs에 쓸 이름 - add_to_name: "_py311" #이미지 build후 tag로 되는 부분 args: - PYTHON_VERSION=3.11 #dockerfile에 넘길 arg의 값 partials: - python_mt310 #partial.Dockerfile이름 vscode_sodaflow: - add_to_name: "" args: - SPSD_VERSION=1.2.32 partials: - vscode - sodaflow python_package_sllm: - add_to_name: "" partials: - python_packages_sllm cuda116_ubuntu2004: - add_to_name: "cuda_116_ubuntu_2004" partials: - cuda_116_ubuntu2004 python_310: - add_to_name: "_py310" args: - PYTHON_VERSION=3.10 partials: - python_310 torch_cu116: - add_to_name: "" args: # - TORCH_VERSION=1.10.1+cu113-cp37-cp37m-linux_x86_64 # - TORCHVISION_VERSION=0.11.2+cu113-cp37-cp37m-linux_x86_64 - TORCH_VERSION=1.13.0+cu116-cp310-cp310-linux_x86_64 - TORCHVISION_VERSION=0.14.0+cu116-cp310-cp310-linux_x86_64 - SPSD_VERSION=1.2.30 partials: - torch_cu116 - jupyter - triton_client - sodaflow
이후 spec.yml의 release에 아래와 같이 작성하면 됨
releases: # Built Nightly and pushed to tensorflow/tensorflow nightly: tag_specs: - "{nightly}{jupyter}" - "{_TAG_PREFIX}{ubuntu-devel}" sejong_cuda110_vscode_basd: #나중에 shell script에서 사용할 이름 tag_specs: - "{cuda_110}{python_37}{torch_vscode_sejeong}{python_package_basd}{nexus_repo_online}{run_vscode}" #slice_sets에서 썼던 이름을 {...}안에 써야함, 이것을 기반으로 add_to_name을 이용하여 tag가 만들어짐 sejong_cuda110_jupyter_basd: tag_specs: - "{cuda_110}{python_37}{torch_jupyter_sejeong}{python_package_basd}{nexus_repo_online}{run_jupyter}" sejong_scode: tag_specs: - "{cuda_110}{basd_sejeong}"
spec.yml 크게 release와 slice_sets부분으로 나뉨
이후 /data/data04/docker_image_builder에서 shell script를 아래와 같이 작성
#!/bin/bash function asm_images() { sudo docker run --rm -v $(pwd):/tf -v /etc/docker/daemon.json:/etc/docker/daemon.json -v /var/run/docker.sock:/var/run/docker.sock build-tools:latest python3 assemble.py "$@" } asm_images \ --release gwangju_cuda116_ubuntu2004 \ #spec.yml의 release에서 썼던 이름 --arg _TAG_PREFIX=v240513 \ #--arg를 통해이미지 tag에 원하는 문자열을 넣을 수 있음 --build_images \ --repository hub.sparklingsoda.io:80 \ --image_name vscode #실제 이미지 이름 echo "All Done." ~
shell script 작성 후 실행하면 됨
★ assemble.py가 이미지 빌드하는 과정
※assemble.py위에 cli옵션으로 넘길 수 있는 부분이 나와있음
FLAGS = flags.FLAGS flags.DEFINE_string('image_name', None, 'build image name') flags.DEFINE_string( 'repository', 'tensorflow', 'Tag local images as {repository}:tag (in addition to the ' 'hub_repository, if uploading to hub)') flags.DEFINE_boolean( 'build_images', False, 'Do not build images', short_name='b') flags.DEFINE_multi_string( 'release', [], 'Set of releases to build and tag. Defaults to every release type.', short_name='r') flags.DEFINE_multi_string( 'arg', [], ('Extra build arguments. These are used for expanding tag names if needed ' '(e.g. --arg _TAG_PREFIX=foo) and for using as build arguments (unused ' 'args will print a warning).'), short_name='a')
i. spec.yml load(tag_spec에 저장)
def main(argv): if len(argv) > 1: raise app.UsageError('Too many command-line arguments.') # Read the full spec file, used for everything with open(FLAGS.spec_file, 'r') as spec_file: tag_spec = yaml.safe_load(spec_file)
ii. partial directory에 있는 모든 dockerfile이름을 partials에 저장
# Get existing partial contents partials = gather_existing_partials(FLAGS.partial_dir)
def gather_existing_partials(partial_path): """Find and read all available partials. Args: partial_path (string): read partials from this directory. Returns: Dict[string, string] of partial short names (like "ubuntu/python" or "bazel") to the full contents of that partial. """ partials = {} for path, _, files in os.walk(partial_path): for name in files: fullpath = os.path.join(path, name) if '.partial.Dockerfile' not in fullpath: eprint(('> Probably not a problem: skipping {}, which is not a ' 'partial.').format(fullpath)) continue # partial_dir/foo/bar.partial.Dockerfile -> foo/bar simple_name = fullpath[len(partial_path) + 1:-len('.partial.dockerfile')] with open(fullpath, 'r', -1, 'utf-8') as f: partial_contents = f.read() partials[simple_name] = partial_contents return partials
iii. spec.yaml을 SCHEMA_TEXT에 따라 검증, 변경함 (이하 spec.yml==spec)
# Abort if spec.yaml is invalid schema = yaml.safe_load(SCHEMA_TEXT) v = TfDockerTagValidator(schema, partials=partials) if not v.validate(tag_spec): eprint('> Error: {} is an invalid spec! The errors are:'.format( FLAGS.spec_file)) eprint(yaml.dump(v.errors, indent=2)) exit(1) tag_spec = v.normalized(tag_spec)
검증할 때 기준이 되는 yaml
SCHEMA_TEXT = """ header: type: string slice_sets: type: dict keyschema: type: string valueschema: type: list schema: type: dict schema: add_to_name: type: string dockerfile_exclusive_name: type: string dockerfile_subdirectory: type: string partials: type: list schema: type: string ispartial: true test_runtime: type: string required: false tests: type: list default: [] schema: type: string args: type: list default: [] schema: type: string args: type: list default: [] schema: type: string isfullarg: true releases: type: dict keyschema: type: string valueschema: type: dict schema: is_dockerfiles: type: boolean required: false default: false upload_images: type: boolean required: false default: true tag_specs: type: list required: true schema: type: string """
spec.yml에는 partials로 있지만 partials 디렉토리에 없을 시 error를 return
class TfDockerTagValidator(cerberus.Validator): """Custom Cerberus validator for TF tag spec. Note: Each _validate_foo function's docstring must end with a segment describing its own validation schema, e.g. "The rule's arguments are...". If you add a new validator, you can copy/paste that section. """ def __init__(self, *args, **kwargs): # See http://docs.python-cerberus.org/en/stable/customize.html if 'partials' in kwargs: self.partials = kwargs['partials'] super(cerberus.Validator, self).__init__(*args, **kwargs) def _validate_ispartial(self, ispartial, field, value): """Validate that a partial references an existing partial spec. Args: ispartial: Value of the rule, a bool field: The field being validated value: The field's value The rule's arguments are validated against this schema: {'type': 'boolean'} """ if ispartial and value not in self.partials: self._error(field, '{} is not present in the partials directory.'.format(value)) def _validate_isfullarg(self, isfullarg, field, value): """Validate that a string is either a FULL=arg or NOT. Args: isfullarg: Value of the rule, a bool field: The field being validated value: The field's value The rule's arguments are validated against this schema: {'type': 'boolean'} """ if isfullarg and '=' not in value: self._error(field, '{} should be of the form ARG=VALUE.'.format(value)) if not isfullarg and '=' in value: self._error(field, '{} should be of the form ARG (no =).'.format(value))
iv. assemble_tags함수 실행
# Assemble tags and images used to build them all_tags = assemble_tags(tag_spec, FLAGS.arg, FLAGS.release, partials)
a. spec['release']에서 shell script의 --release와 일치 또는 포함하는 부분을 탐색함
b. spec['slice_sets']에서 1.에서 찾은 release의 tag_spec과 일치하는 부분을 탐색함
c. b.에서 찾은 tag_spec의 spec['slice_sets']에 있는 모든 정보를 모음
def assemble_tags(spec, cli_args, enabled_releases, all_partials): """Gather all the tags based on our spec. Args: spec: Nested dict containing full Tag spec cli_args: List of ARG=foo arguments to pass along to Docker build enabled_releases: List of releases to parse. Empty list = all all_partials: Dict of every partial, for reference Returns: Dict of tags and how to build them """ tag_data = collections.defaultdict(list) for name, release in spec['releases'].items(): for tag_spec in release['tag_specs']: if enabled_releases and name not in enabled_releases: eprint('> Skipping release {}'.format(name)) continue used_slice_sets, required_cli_args = get_slice_sets_and_required_args( spec['slice_sets'], tag_spec) slice_combos = aggregate_all_slice_combinations(spec, used_slice_sets)
d. c.에서 찾은 정보 중 args만 따로 모음
for slices in slice_combos: tag_args = gather_tag_args(slices, cli_args, required_cli_args)
e. 4.의 arg와 spec['slice_sets']의 add_to_name으로 tag를 만듬
tag_name = build_name_from_slices(tag_spec, slices, tag_args, release['is_dockerfiles'])
f. c.에서 찾은 정보 중 partial만 따로 모음
used_partials = gather_slice_list_items(slices, 'partials')
g. 사용할 partial.Dockerfile 내용을 join
dockerfile_contents = merge_partials(spec['header'], used_partials, all_partials)
def merge_partials(header, used_partials, all_partials): """Merge all partial contents with their header.""" used_partials = list(used_partials) return '\n'.join([header] + [all_partials[u] for u in used_partials])
h. a~f까지 모은 정보에 partial의 dockerfile까지 모아서 return
tag_data[tag_name].append({ 'release': name, 'tag_spec': tag_spec, 'is_dockerfiles': release['is_dockerfiles'], 'upload_images': release['upload_images'], 'cli_args': tag_args, 'dockerfile_subdirectory': dockerfile_subdirectory or '', 'partials': used_partials, 'tests': used_tests, 'test_runtime': test_runtime, 'dockerfile_contents': dockerfile_contents, }) return tag_data
v. temp.dockerfile에 7.에서 모은 dockerfile을 write
# Generate a temporary Dockerfile to use to build, since docker-py # needs a filepath relative to the build context (i.e. the current # directory) dockerfile = os.path.join(FLAGS.dockerfile_dir, tag + '.temp.Dockerfile') if not FLAGS.dry_run: with open(dockerfile, 'w', -1, 'utf-8') as f: f.write(tag_def['dockerfile_contents']) eprint('>> (Temporary) writing {}...'.format(dockerfile))
vi. temp.dockerfile을 cli_args(4.에서 모은 정보)와 함께 docker api를 이용하여 build
tag_failed = False image, logs = None, [] if not FLAGS.dry_run: try: # Use low level APIClient in order to stream log output resp = dock.api.build( timeout=FLAGS.hub_timeout, path='.', nocache=FLAGS.nocache, dockerfile=dockerfile, buildargs=tag_def['cli_args'], network_mode='host', tag=repo_tag)
※주의
1. shell script의 --arg 옵션으로 태그가 달리는 것이 아님
tag_specs에 {...}자리에 add_to_name들이 tag로 됨
gwangju_cuda116_ubuntu2004: tag_specs: - "{cuda116_ubuntu2004}{python_310}{torch_vscode_cu116}{python_package_sllm}{nexus_repo_focal_release}{run_vscode}"
cuda116_ubuntu2004: - add_to_name: "cuda_116_ubuntu_2004" partials: - cuda_116_ubuntu2004 python_310: - add_to_name: "_py310" args: - PYTHON_VERSION=3.10 partials: - python_310
--arg를 tag에 활용하려면 tag_specs에 --arg에 해당하는 옵션을 넣어야함
release_cuda116_py310: tag_specs: - "{_TAG_PREFIX}{cuda116_ubuntu2004}{python_310}{torch_vscode_cu116}{nexus_repo_focal_release}{run_vscode}" release_vscode_py311: tag_specs: - "{_TAG_PREFIX}{ubuntu}{python_311}{vscode_sodaflow}{nexus_repo_focal_release}{run_vscode}" release_vscode_py310: tag_specs: - "{_TAG_PREFIX}{ubuntu}{python_310}{vscode_sodaflow}{nexus_repo_focal_release}{run_vscode}" release_vscode_torch_cpu_py310: tag_specs: - "{_TAG_PREFIX}{ubuntu}{python_310}{torch_py310_vscode_cpu}{nexus_repo_focal_release}{run_vscode}"
#!/bin/bash function asm_images() { sudo docker run --rm -v $(pwd):/tf -v /etc/docker/daemon.json:/etc/docker/daemon.json -v /var/run/docker.sock:/var/run/docker.sock build-tools:latest python3 assemble.py "$@" } asm_images --release release_vscode_torch_cpu_py310 --arg _TAG_PREFIX=v240605 --build_images --repository hub.sparklingsoda.io:80 --image_name vscode echo "All Done."
sudo docker images | grep vscode hub.sparklingsoda.io:80/vscode v240605_py310_torch cda89fc3910d 24 hours ago 4.17GB hub.sparklingsoda.io:80/vscode v240605_py310 f2ee3ecab95a 26 hours ago 3.2GB hub.sparklingsoda.io:80/vscode v240610_py311 be9cf44cc0b7 29 hours ago 3.28GB hub.sparklingsoda.io:80/vscode v240605_cuda_116_ubuntu_2004_py310 ed2b404332e6 5 days ago 14.3GB hub.sparklingsoda.io:80/vscode v240605cuda_116_ubuntu_2004_py310 8e3fafc46ff2 6 days ago 6GB hub.sparklingsoda.io:80/vscode cuda_116_ubuntu_2004_py310 7f161e5f9f14 3 weeks ago 15.1GB
2. spec.yml에 args를 쓰더라도 partial dockerfile에 ARG 선언은 해야함(값은 부여하지 않아도 됨)
예시
ARG PYTHON_VERSION ENV DEBIAN_FRONTEND=noninteractive ########################################################################### ## Python ${PYTHON_VERSION} RUN apt-get install -y software-properties-common \ && apt update \ && add-apt-repository ppa:deadsnakes/ppa \ && apt install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev
아티클이 유용했나요?
훌륭합니다!
피드백을 제공해 주셔서 감사합니다.
도움이 되지 못해 죄송합니다!
피드백을 제공해 주셔서 감사합니다.
피드백 전송
소중한 의견을 수렴하여 아티클을 개선하도록 노력하겠습니다.