首页 技术 正文
技术 2022年11月12日
0 收藏 820 点赞 3,266 浏览 5180 个字

今天为了求解hiveserver占用内存过大的问题,特地加了hive在apache的邮件列表,讨论半天。特别说的是 里面的人确实很热情啊 ,外国人做事确实很认真,讨论帖发的时候都狠详细。

粘出一些记录:

Did you update your JDK in last time? A java-dev told me that could be
a issue in JDK _26
(https://forums.oracle.com/forums/thread.jspa?threadID=2309872), some
devs report a memory decrease when they use GC - flags. I'm quite not
sure, sounds for me to far away.The stacks have a lot waitings, but I see nothing special.- Alex2011/12/12 王锋 <wfeng1982@163.com>:
>
> The hive log:
>
> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt
> 8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)]
> 9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00,
> real=0.08 secs]
> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt
> 8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)]
> 8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01,
> real=0.07 secs]
> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt
>
> Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem
> still be so large .I'm mad, God
>
> have other suggestions ?
>
> 在 2011-12-12 17:59:52,"alo alt" <wget.null@googlemail.com
>> 写道:
>>When you start a high-load hive query can you watch the stack-traces?
>>Its possible over the webinterface:
>>http://jobtracker:50030/stacks
>>
>>- Alex
>>
>>
>>2011/12/12 王锋 <wfeng1982@163.com>
>>>
>>> hiveserver will throw oom after several hours .
>>>
>>>
>>> At 2011-12-12 17:39:21,"alo alt" <wget.null@googlemail.com> wrote:
>>>
>>> what happen when you set xmx=2048m or similar? Did that have any negative effects for running queries?
>>>
>>> 2011/12/12 王锋 <wfeng1982@163.com>
>>>>
>>>> I have modify hive jvm args.
>>>> the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .
>>>>
>>>> but the memory used by hiveserver is still large.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.sun82@gmail.com> wrote:
>>>>
>>>> Not from the running jobs, what I am saying is the heap size of the Hadoop really depends on the number of files, directories on the HDFS. Remove old files periodically or merge small files would bring in some performance boost.
>>>>
>>>> On the Hive end, the memory consumed also depends on the queries that are executed. Monitor the reducers of the Hadoop job, and my experiences are that reduce part could be the bottleneck here.
>>>>
>>>> It's totally okay to host multiple Hive servers on one machine.
>>>>
>>>> 2011/12/12 王锋 <wfeng1982@163.com>
>>>>>
>>>>> is the files you said the files from runned jobs of our system? and them can't be so much large.
>>>>>
>>>>> why is the cause of namenode. what are hiveserver doing when it use so large memory?
>>>>>
>>>>> how do you use hive? our method using hiveserver is correct?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> 在 2011-12-12 14:27:09,"Aaron Sun" <aaron.sun82@gmail.com> 写道:
>>>>>
>>>>> Not sure if this is because of the number of files, since the namenode would track each of the file and directory, and blocks.
>>>>> See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>>
>>>>> Please correct me if I am wrong, because this seems to be more like a hdfs problem which is actually irrelevant to Hive.
>>>>>
>>>>> Thanks
>>>>> Aaron
>>>>>
>>>>> 2011/12/11 王锋 <wfeng1982@163.com>
>>>>>>
>>>>>>
>>>>>> I want to know why the hiveserver use so large memory,and where the memory has been used ?
>>>>>>
>>>>>> 在 2011-12-12 10:02:44,"王锋" <wfeng1982@163.com> 写道:
>>>>>>
>>>>>>
>>>>>> The namenode summary:
>>>>>>
>>>>>>
>>>>>>
>>>>>> the mr summary
>>>>>>
>>>>>>
>>>>>> and hiveserver:
>>>>>>
>>>>>>
>>>>>> hiveserver jvm args:
>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParall
>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
>>>>>>
>>>>>> now we using 3 hiveservers in the same machine.
>>>>>>
>>>>>>
>>>>>> 在 2011-12-12 09:54:29,"Aaron Sun" <aaron.sun82@gmail.com> 写道:
>>>>>>
>>>>>> how's the data look like? and what's the size of the cluster?
>>>>>>
>>>>>> 2011/12/11 王锋 <wfeng1982@163.com>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm one of engieer of sina.com. We have used hive ,hiveserver several months. We have our own tasks schedule system .The system can schedule tasks running with hiveserver by jdbc.
>>>>>>>
>>>>>>> But The hiveserver use mem very large, usally large than 10g. we have 5min tasks which will be running every 5 minutes.,and have hourly tasks .total num of tasks is 40. And we start 3 hiveserver in one linux server,and be cycle connected .
>>>>>>>
>>>>>>> so why Memory of hiveserver using so large and how we do or some suggestion from you ?
>>>>>>>
>>>>>>> Thanks and Best Regards!
>>>>>>>
>>>>>>> Royce Wang

最上面   Alex发现一篇文章

https://forums.oracle.com/forums/thread.jspa?threadID=2309872说是 jdk_1.0.26存在泄露的风险,我们正在使用也正是同一个版本,看这个url里文章说的也是谁也不能确认,而oracle方自然说不由其负责。
I tried with java6u29 and java7 and they work great. Actually on the production server we are running for almost 4 days with java7 and it's stable, no crash, no slowdown, no restart in this period, and with less maximum memory. If it's going to last for a week then I trust it will go on fine.
最后是有人用java6u29 和java7 运行 稳定。特别是java7.明天尝试在hiveserver服务器换用java7试试。

append。。。。

今天改用jdk 7测试 情况基本一致,看来问题并不是 jvm问题。

使用jmap -heap 发现  hiveserver 新生代 并没有去按照ratio设置的 那样,最大容量还是默认的800m,这个对数据分析来说太小了,使用xmn配置新生代,并配置最大新生代大小,而且将gc机制改为cms,目前内存占用稳定在 2.3g左右。

最后的参数 :

export HADOOP_OPTS=”$HADOOP_OPTS  -Xms5000m -Xmn4000m -XX:MaxNewSize=4000m -Xss128k  -XX:MaxHeapFreeRatio=80 -XX:MinHeapFreeRatio=40 -XX:+UseParNewGC -XX:+UseConcMarkSw
eepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:-UseGCOverheadLimit -XX:MaxTenuringThreshold=8 -XX:P
ermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps”

相关推荐
python开发_常用的python模块及安装方法
adodb:我们领导推荐的数据库连接组件bsddb3:BerkeleyDB的连接组件Cheetah-1.0:我比较喜欢这个版本的cheeta…
日期:2022-11-24 点赞:878 阅读:9,488
Educational Codeforces Round 11 C. Hard Process 二分
C. Hard Process题目连接:http://www.codeforces.com/contest/660/problem/CDes…
日期:2022-11-24 点赞:807 阅读:5,903
下载Ubuntn 17.04 内核源代码
zengkefu@server1:/usr/src$ uname -aLinux server1 4.10.0-19-generic #21…
日期:2022-11-24 点赞:569 阅读:6,736
可用Active Desktop Calendar V7.86 注册码序列号
可用Active Desktop Calendar V7.86 注册码序列号Name: www.greendown.cn Code: &nb…
日期:2022-11-24 点赞:733 阅读:6,487
Android调用系统相机、自定义相机、处理大图片
Android调用系统相机和自定义相机实例本博文主要是介绍了android上使用相机进行拍照并显示的两种方式,并且由于涉及到要把拍到的照片显…
日期:2022-11-24 点赞:512 阅读:8,127
Struts的使用
一、Struts2的获取  Struts的官方网站为:http://struts.apache.org/  下载完Struts2的jar包,…
日期:2022-11-24 点赞:671 阅读:5,289