Luca Soriani - Air Canva

Air Canva

Home
Progetti
Air Canva

Air Canva

Computer vision is a field of artificial intelligence that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs — and take actions or make recommendations based on that information. If AI enables computers to think, computer vision enables them to see, observe and understand.

Computer vision trains machines to perform these functions, but it has to do it in much less time with cameras, data and algorithms rather than retinas, optic nerves and a visual cortex. Because a system trained to inspect products or watch a production asset can analyze thousands of products or processes a minute, noticing imperceptible defects or issues, it can quickly surpass human capabilities.

It runs analyses of data over and over until it discerns distinctions and ultimately recognize images. For example, to train a computer to recognize automobile tires, it needs to be fed vast quantities of tire images and tire-related items to learn the differences and recognize a tire, especially one with no defects.

Two essential technologies are used to accomplish this: a type of machine learning called deep learning and a convolutional neural network (CNN).

A Convolutional Neural Network, also known as CNN or ConvNet, is a class of neural networks that specializes in processing data that has a grid-like topology, such as an image. A digital image is a binary representation of visual data. It contains a series of pixels arranged in a grid-like fashion that contains pixel values to denote how bright and what color each pixel should be.

This script provides an air canva, so that you can draw using your fingers on a virtual whiteboard.

For this purpose, two modules are defined:

'main.py' which is responsible for 'drawing' on the virtual board
'hand_tracking.py' which is responsible for detecting and tracking the fingers of the hand.

In the hand_tracking module, the code implements the 'OpenCV' and 'Mediapipe' libraries to recognize images. The class 'HandDetector' is defined to find hands in an image and calculate the position of hand landmarks. The HandDetector class contains the following methods:

'__init__': initializes the variables mode=False which identifies the mode of image recognition whether they are static or a stream, maxHands=2 which identifies the maximum number of hands to be detected, detectionCon=0.5 which identifies the lowest score for successfully declaring that a hand has been known, trackCon=0.5 which identifies the minimum score for which a landmark is successfully recognized, complexity=1 which identifies the complexity of the hand reference model.
'findHands': takes an image as input and converts it from BGR to RGB. It then processes the image to find hands and draws landmarks on the detected hands.
'findPosition': takes an image, the hand number and a Boolean value as input and returns a list of landmarks positions. The function finds the position of the hands and draws landmarks on the image.
'fingersUp': checks whether the end of the fingers is higher than the previous landmark and returns a list of $1$ or $0$ depending on the position of the fingers.
main the code captures the video from the webcam, finds the landmarks on the hands, calculates the position of the hands, resizes the image to the screen size, flips the image horizontally, calculates the frames per second, draws the frames per second on the image, displays the image, and checks if the user pressed the 'q' key. If the user has pressed the 'q' key, the program stops.

In the main module we find the 'cv2 (OpenCV) module to capture video from the webcam, the 'numpy' module to work with image arrays, and the 'os' module to get the list of images from the 'header' folder used to create the air canva header. It also imports two custom modules 'utils' and 'hand_tracking'. The script captures the video from the webcam, then detects the hands in the frame using the HandDetector object defined in the hand_tracking module. Then, using the coordinates of the detected hand fingers, the code detects the number of raised fingers and performs different actions based on the number of raised fingers.

If the user raises only his index finger, then he can draw and write on the air canva. The default color is blue.
If the user raises the index and middle finger (keeping the thumb down), the program goes into 'selection' mode. In this case the user will stop writing and can go to the top of the window. The drop-down menu will open in which the user can change the color, erase, save the image or wipe the board.
If only the thumb and forefinger are detected, you can go and change the size of the 'brush': if you increase the distance between the thumb and forefinger, it will increase the size of the brush, and if they come closer together, it will decrease the size of the brush.
If you raise your thumb, index and middle finger at the same time, the program will recognize the save command without having to go to select from the menu. In this case, a 3-second timer will be activated so that the user can stabilize. Two images will be saved: one as the user sees the image, that is, with the background; one with the disengo and white background. The saved image will be in '.jpg' format.
If you lift all five fingers, The program understands that it must stop and closes.

The numpy module is used to create the images: depending on the values assigned to the arrays, an empty image is initialized on which will be diesgned, the path and a white image used to save the final image. The program also uses the 'time' module to manage the timer and the 'math' module to calculate the distance between the thumb and index finger. Images are handled in this way:

Two images are initialized a white one np.ones((720, 1280,3),np.uint8)*255 which will not change in the process and a black one np.zeros((720, 1280, 3), np.uint8) which will be used to make the background transparent and display the user while writing.
Next via the cv2 module the real-time image is read from the camera.
One goes to draw on the real time image, on the one with white background and the one with black background.
One converts the image with black background to grayscale.
A transformation is done via the 'threshold' function of cv2 so that the array values that are below the threshold are automatically set to black, while those that are above to white.
The image is converted to BRG.
Finally, this image is subtracted from the camera source image, thus making the background transparent.

Project Information

Category: ML
Proeject Url:
About: Created an Air Canva so you can draw using your fingers.

'main.py' which is responsible for 'drawing' on the virtual board
'hand_tracking.py' which is responsible for detecting and tracking the fingers of the hand.

'__init__': initializes the variables mode=False which identifies the mode of image recognition whether they are static or a stream, maxHands=2 which identifies the maximum number of hands to be detected, detectionCon=0.5 which identifies the lowest score for successfully declaring that a hand has been known, trackCon=0.5 which identifies the minimum score for which a landmark is successfully recognized, complexity=1 which identifies the complexity of the hand reference model.
'findHands': takes an image as input and converts it from BGR to RGB. It then processes the image to find hands and draws landmarks on the detected hands.
'findPosition': takes an image, the hand number and a Boolean value as input and returns a list of landmarks positions. The function finds the position of the hands and draws landmarks on the image.
'fingersUp': checks whether the end of the fingers is higher than the previous landmark and returns a list of $1$ or $0$ depending on the position of the fingers.
main the code captures the video from the webcam, finds the landmarks on the hands, calculates the position of the hands, resizes the image to the screen size, flips the image horizontally, calculates the frames per second, draws the frames per second on the image, displays the image, and checks if the user pressed the 'q' key. If the user has pressed the 'q' key, the program stops.

If the user raises only his index finger, then he can draw and write on the air canva. The default color is blue.
If the user raises the index and middle finger (keeping the thumb down), the program goes into 'selection' mode. In this case the user will stop writing and can go to the top of the window. The drop-down menu will open in which the user can change the color, erase, save the image or wipe the board.
If only the thumb and forefinger are detected, you can go and change the size of the 'brush': if you increase the distance between the thumb and forefinger, it will increase the size of the brush, and if they come closer together, it will decrease the size of the brush.
If you raise your thumb, index and middle finger at the same time, the program will recognize the save command without having to go to select from the menu. In this case, a 3-second timer will be activated so that the user can stabilize. Two images will be saved: one as the user sees the image, that is, with the background; one with the disengo and white background. The saved image will be in '.jpg' format.
If you lift all five fingers, The program understands that it must stop and closes.

Two images are initialized a white one np.ones((720, 1280,3),np.uint8)*255 which will not change in the process and a black one np.zeros((720, 1280, 3), np.uint8) which will be used to make the background transparent and display the user while writing.
Next via the cv2 module the real-time image is read from the camera.
One goes to draw on the real time image, on the one with white background and the one with black background.
One converts the image with black background to grayscale.
A transformation is done via the 'threshold' function of cv2 so that the array values that are below the threshold are automatically set to black, while those that are above to white.
The image is converted to BRG.
Finally, this image is subtracted from the camera source image, thus making the background transparent.