Training set with Stellarium II

Checking where next with our “super-awesome” 42GB training set, I’ve realized that for a next step we’ll need much lower resolution. We are now on 1920 x 1080, while many of those traditional neuron networks like ResNet are happily coping with much more modest 224 x 224.

So well onto some tiny Python scripting to get our set converted to 224 x 224 – grayscale! Following script does that magic in a matter of 90 minutes.

#!/usr/bin/env python3

from argparse import ArgumentParser

from PIL import Image, ImageOps
import os
from glob import glob

def transform(im, args):
  left = im.width/2-args.width/2*args.scale
  top = im.height/2-args.height/2*args.scale
  right = im.width/2+args.width/2*args.scale
  bottom = im.height/2+args.height/2*args.scale
  return left, top, right, bottom

def crop(args):
  result = [y for x in os.walk(args.input) for y in glob(os.path.join(x[0], '*.png'))]
  counter = 0

  for filename in result:
    with Image.open(filename) as im:
      im2 = im.crop(transform(im, args))
      im2 = im2.resize((args.width, args.height))
      if args.grayscale:
        im2 = ImageOps.grayscale(im2)
      filename_out = filename.replace(args.input, args.output)
      os.makedirs(os.path.dirname(filename_out), exist_ok=True)
      im2.save(filename_out)
    counter += 1
    pct = round(counter/len(result)*100,4);
    print("Finished processing for", filename_out, "\t[", pct, "%]")


def main():
  parser = ArgumentParser(description='Recursive crop & transformation for image files')

  parser.add_argument('--input', default='./train', help='input folder')
  parser.add_argument('--output', default='./train_224_224_monochrome', help='output folder')
  parser.add_argument('--height', default=224, help='image height')
  parser.add_argument('--width', default=224, help='image width')
  parser.add_argument('--grayscale', default=True, help='convert output to grayscale')
  parser.add_argument('--scale', default=2, help='scale down coefficient')

  args = parser.parse_args()

  print('input  folder: ', args.input)
  print('output folder: ', args.output)

  if args:
    crop(args)

  else:
    parser.print_help()

  print("Done")

main()

This whole operation worked out reducing our initial 48.5GB monster train set to much more convenient 1.4GB.

Result seems to be bit radical, but well it is like it is.

Sirius 1920 x 1080 colour (before)
Sirius 224 x 224 colour (after)

Meanwhile Sebi kept reading and experimenting with machine learning and got some fantastic results there, but that’s for another post. 🙂

One thought on “Training set with Stellarium II

Leave a Reply