Thanks, Haritha. When training a ‘normal’ network you need input features x_i to predict the rating y_i. And here is the problem. What are these x_i? As you can imagine there is a lot of feature engineering necessary to decide which x_i describe the movie appropriately in the first place. This is why collaborative filtering is a better approach in this case.
But I can not tell if this approach is a better way than using features x_i, since I never have tried it.
Maybe one good way to obtain x_i is just to encode each of the movies as a feature vector, and let the neural network treat this vector as learnable weights. This worked for me in the past with encoding of words and chars in some natural language generation and understanding projects.