Introduction to Apache Hive

This is an introductory course on one of the most used tools in big data - Hive. Hive is an ETL and data warehouse infrastructure software that can create interaction between user and Hadoop Distributed File System (HDFS).

The course starts with the introduction to Hive before progressing to next topics which utilise a hands-on approach to explain. You will learn internal and external table structures, reading data from different formats into Hive structure. With the help of easy and intuitive explanation, you will get a good grasp on how to load data into Hive, querying techniques as well as generating views in Hive tables.

1 Like

I am stuck on loading data from csv file into my table in the cloudxlab environment any solution to that (for project submission)

hive> load local inpath 'yellow_tripdata_2015-01-06.csv' into table yellow_trip;
FAILED: ParseException line 1:5 missing DATA at 'local' near '<EOF>'

Also how are we supposed to answer questions like :
Q. What fraction of the total is paid for tolls? The toll is stored in tolls_amount.

There is no way to query the database using hive commands to get it ,as far as taught in the previous videos .

I am trying to work on Hive project. Please guide from where and which dataset I need to download ?

Dear Umang,

The dataset to use for Project is: yellow_tripdata_2015-01-06.csv

Please share Hive.txt file that the trainer is using in “Hive Illustration : Basics”