0% found this document useful (0 votes)
7 views

Machine Learning

The document discusses controlling a VLC media player using hand gestures detected with a CNN model and OpenCV. TensorFlow and OpenCV are used to build the CNN model and capture video. The CNN kernel is used to extract features from images to classify hand gestures for playing, pausing, forwarding and rewinding media.

Uploaded by

5177RAJU
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Machine Learning

The document discusses controlling a VLC media player using hand gestures detected with a CNN model and OpenCV. TensorFlow and OpenCV are used to build the CNN model and capture video. The CNN kernel is used to extract features from images to classify hand gestures for playing, pausing, forwarding and rewinding media.

Uploaded by

5177RAJU
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Controlling VLC player

using hand gestures

By
D.Lohith Varma
G.Bharadwaj
J.Zaheer Khan
M.Jithin Kasyap
TensorFlow
Modules
OpenCV
Tensorflow

➢ TensorFlow is a free and open-source software library for machine learning and
artificial intelligence.
➢ It can be used across a range of tasks but has a

particular focus on training and inference of

deep neural networks.

➢ TensorFlow is a symbolic math library based on dataflow

and differentiable programming


OpenCV

➢ OpenCV is a library of programming functions mainly

aimed at real-time computer vision.

➢ It mainly focuses on image processing, video capture

and analysis including features like face detection

and object detection.


CNN kernal

➢ In Convolutional Neural Network, the kernel is nothing but a filter that is used to extract the
Features from the images.
➢ The kernel is a matrix that moves over the input data, performs the dot product with the
sub-region of input data, and gets the output as the matrix of dot products.
CNN kernel for Vedio capture
import pandas as pd

import numpy as np

from matplotlib import pyplot as plt

from collections import Counter

from keras import layers

from keras.layers import Input, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D

from keras.layers import AveragePooling2D, MaxPooling2D, Dropout, GlobalMaxPooling2D, GlobalAveragePooling2D

from keras.models import Model

from keras.preprocessing import image

from keras.utils import layer_utils


from keras.utils.data_utils import get_file

from keras.applications.imagenet_utils import preprocess_input

from keras.utils.vis_utils import model_to_dot

from keras.utils import plot_model

from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

from sklearn.metrics import confusion_matrix

from keras.preprocessing.image import ImageDataGenerator

from keras.utils import to_categorical

from keras import optimizers

import cv2

from itertools import chain

import glob

Link:- https://www.kaggle.com/mosfather/cnn-kernel-for-video-capture
Libraries

Pynput :-

It is used for controlling keyboard as we have used it for


controlling the space bar and arrow keys it helps to play, pause, forward and rewind the
media player.

Matplotlib:-

Matplotlib is a low level graph plotting library in python that serves as a


visualization utility.
Code
import cv2

import numpy as np

import matplotlib.pyplot as plt

from tensorflow.keras.models import load_model

from pynput.keyboard import Controller, Key

import time

cont= Controller()

flag = True

model = load_model('handmodel_fingers_weights.hdf5')
while True:
wind = np.zeros((200,200,3))
_, frame = cap.read()
frame = cv2.flip(frame, 1)

show = frame[50:200, 400:550]


frame = cv2.blur(frame, (2,2))
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = gray[50:200, 400:550]

_, mask = cv2.threshold(gray,120 ,255 ,cv2.THRESH_BINARY_INV)


mask = mask / 255.0
mask = cv2.resize(mask, (128,128))
mask = mask.reshape(-1,128,128,1)
result=model.predict(mask)
res = np.argmax(result)
#print(res)

cv2.putText(wind, "{}".format(res),(50,125), cv2.FONT_HERSHEY_SIMPLEX,3,(0,255,0),2)


if flag:
if res == 0:
cont.press(Key.space)
elif res == 1:
cont.press(Key.up)
cont.release(Key.up)
flag = False

elif res == 2:
cont.press(Key.down)
cont.release(Key.down)
flag = False

elif res == 3:
cont.press(Key.left)
cont.release(Key.left)
flag = False

elif res == 4:
cont.press(Key.right)
cont.release(Key.right)
flag = False
cv2.imshow("main", show)
cv2.imshow("result", mask.reshape(128,128))
cv2.imshow("", wind)
end = time.time()
if (end - start) > 2:
start = end
flag = True
if cv2.waitKey(1) == 27:
break
cap.release()
cv2.destroyAllWindows()
Output
Output
Output

You might also like