Microsoft Corporate Account
Data Platform Customer Programs Home
Hive metastore not accesible after creating partitions
as Won't Fix
7/2/2013 10:15:57 AM
User(s) can reproduce this bug
We have created a number of partitions (about 1300) in a table.
After that, metastore seems to go very slow. For example a "show tables" could last up to 30 seconds or even timeout.
If we start a new session in Hive, metastore seems to go fast again as long as we don't try to read the partitioned table.
If we try it, for example "select * from table limit 10;", the sentence fails and metastore goes slowly again.
We cannot access the partitioned table, nor drop it.
Steps to Reproduce
Create a table with the structure below and create the partitions with the attached script
CREATE TABLE Statistics10Min(WindFarm STRING,WTG STRING, LocalTS STRING,TS STRING,WindSpeedAvg FLOAT,WindSpeedMin FLOAT,WindSpeedMax FLOAT,WindSpeedStdDev FLOAT,ActPowAvg FLOAT,ActPowMin FLOAT,ActPowMax FLOAT,ActPowStdDev FLOAT,ReactPowAvg FLOAT,ReactPowMin FLOAT,ReactPowMax FLOAT,ReactPowStdDev FLOAT,GenSpeedAvg FLOAT,GenSpeedMin FLOAT,GenSpeedMax FLOAT,GenSpeedStdDev FLOAT,RotSpeedAvg FLOAT,RotSpeedMin FLOAT,RotSpeedMax FLOAT,RotSpeedStdDev FLOAT,YawDirAvg FLOAT,YawDirMin FLOAT,YawDirMax FLOAT,YawDirStdDev FLOAT,VibTowAvg FLOAT,VibTowMin FLOAT,VibTowMax FLOAT,VibTowStdDev FLOAT,PitchAvg FLOAT,PitchMin FLOAT,PitchMax FLOAT,PitchRateMin FLOAT,PitchRateMax FLOAT,PitchRateStdDev FLOAT,GridVoltAvg FLOAT,GridVoltMin FLOAT,GridVoltMax FLOAT,EnvTempAvg FLOAT,EnvTempMin FLOAT,EnvTempMax FLOAT,GbxTempAvg FLOAT,GbxTempMin FLOAT,GbxTempMax FLOAT,GenAccMin FLOAT,GenAccMax FLOAT,VibratAvg FLOAT,VibratMax FLOAT,GenU1TempAvg FLOAT,GenU1TempMin FLOAT,GenU1TempMax FLOAT,RotorSideLTempAvg FLOAT,RotorSideLTempMin FLOAT,rotorSideLTempMax FLOAT,WindBin FLOAT,FreqMean FLOAT) PARTITIONED BY(CaptureDate STRING, Farm STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '59' LINES TERMINATED BY '10' STORED AS SEQUENCEFILE;
Non accessible table
Table with about 1300 partitions.
to post a comment.
Please enter a comment.
[MSFT] on 7/25/2013 at 2:09 PM
When running a query on the table with lots of partitions, even the client side times out, the metastore database is still processing the unfinished request. That's likely to be the reason the following simple query also takes a long time in the same session.
Pablo Álvarez Doval
on 7/19/2013 at 9:25 AM
Thanks for the suggestion, but the problem is not to configure a timeout value, but why it takes so long for even a 'show tables;'. It seems something gets corrupted in the Hive metastore, or the Hive Service is unable to reach the metastore.
[MSFT] on 7/10/2013 at 3:32 PM
To change the timeout configuration, you can also add --hiveconf hive.metastore.client.socket.timeout=600 in the hive command you run. The configuration can also set through bootstrap actions (this feature will be available soon). In the mean time, we are investigating root-cause for metastore timeout
to post a workaround.
Please enter a workaround.
[MSFT] on 7/3/2013 at 10:05 AM
I reproduce the table and got the same read time out when running the query. The reason is with large amount of partitions, the query will require to scan much larger metastore tables which takes time. There are two solution for such kind of problem. You can define less partition and create bucket inside the partition, or increase the timeout. Now the metastore timeout is set to 60 sec which can easily met for some complex metastore operations. To increase the timeout, change the "hive.metastore.client.socket.timeout" property in hive-site.xml.
© 2015 Microsoft