First steps with Remo python library

The Remo python library provides an intuitive way to visualize, clean and work with images for a variety of computer vision tasks.

Create a dataset

Adding data to Remo is as easy as passing the path or URL of your data's location to the remo.create_dataset() method of the library.

In this example, a sample dataset hosted online is added to Remo directly via URL

import remo
import pandas as pd
# To seamlessly use Remo within the Jupyter Notebook, use the following setting
remo.set_viewer('jupyter')
urls = ['https://remo-scripts.s3-eu-west-1.amazonaws.com/open_images_sample_dataset.zip']

my_dataset = remo.create_dataset(name = 'open images detection',
                    urls = urls,
                    annotation_task = "Object detection")

Acquiring data - completed
Processing data - completed
Data upload completed

That's it! Your images are now accessible and stored in a centralised place.

Remo supports a number of annotation formats and tasks out of the box. You can read more in the documentation.

Manage multiple datasets

Within Remo, you cann host multiple datasets and retrieve one when needed

This allows you to organize and reuse your data across projects.

Let's list all the datasets and retrieve one:

remo.list_datasets()

[Dataset 1 - 'ocr_symbols', Dataset 2 - 'test', Dataset 8 - 'open_images', Dataset 9 - 'test', Dataset 12 - 'open images detection']

# make sure to use the right ID when running the tutorial
new_dataset = remo.get_dataset(1)

Visualize

Providing an easy to use interface is another way in which Remo makes your life working on a computer vision project easier.

By calling dataset.view() method, you can open an interactive interface which allows you to visually inspect your images and the corresponding annotations.

You can visualise your dataset directly in Jupyter (or in a separate window if you are not a fan of notebooks)

my_dataset.view()

view_dataset.gif

Annotation Statistics

Once data is in Remo, you can easily explore the statistics and other important properties of your data.

For example, you can quickly see:

  • what's contained in the annotations
  • check if there are unbalanced classes
  • spot if some objects are only contained in a few images

You can do this by printing the stats of an annotation set or using the interactive UI.

Calling my_dataset.get_annotation_statistics() will print annotation statistics to the screen

my_dataset.get_annotation_statistics()

[{'AnnotationSet ID': 41, 'AnnotationSet name': 'Object detection', 'n_images': 10, 'n_classes': 18, 'n_objects': 98, 'top_3_classes': [{'name': 'Fruit', 'count': 27}, {'name': 'Sports equipment', 'count': 12}, {'name': 'Human arm', 'count': 10}], 'creation_date': None, 'last_modified_date': '2020-05-29T13:38:52.259776Z'}]

Calling my_dataset.view_annotation_stats() will show an interactive dashboard.

Here you can inspect annotations more in details and manage your classes and tags

my_dataset.view_annotation_stats()

annotation_statistics.png

Export Annotations

In order to use the dataset for training a model, you can export the annotations to a standardised format such as CSV, JSON, etc

my_dataset.export_annotations_to_file('output.csv', annotation_format='csv')

Further functionalities

You can refer to other tutorials and the documentation to further explore the library and see how to use it to better manage your datasets.

Some of the other things you can do include:

  • Easily experimenting with choice of annotations from code
  • Custom uploading of annotations and predictions and joint visualization
  • Advanced images search by classes, tags and filenames