Tuesday, June 2, 2015

Significant Performance Drop

I unit tested an alpha algorithm for the identification phase. It isn't the last pass I need to implement but it's the heart of identification. During the implementation I was already a bit worried about how I implemented it, so I, again, did some performance measurements.

Well. A significant performance drop. The reason? It keeps a flat map of all declarations it can deduce. Every time you enter a namespace, or use a using directive, it searches through that whole map and declares in a local dictionary all declarations which share a certain prefix.

Tested on one large file, you get a whole lot of declarations, and for every namespace and using declaration a linear search. That's a quadratic slow-down I guesstimate. For a 1MB file, 1MB/min instead of 1MB/sec (though it takes 3ms for the simple file I copied into 2000 different namespaces to get to 1MB.)

Performance dropped that significantly that it'll only comfortably compile medium sized programs now. I'll leave it as is, since it's just one pass, it's unlikely people will write large monolithic files with lots of namespaces now, and (probably) solvable by someone implementing a better search tree for namespaces at some point.

It strengthens my belief that all algorithms in a compiler should be linear.

Below, performance measurements on a 1MB sized file.

Output of gprof:
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 24.62     27.19    27.19 597938347     0.00     0.00  __gnu_cxx::__atomic_add(int volatile*, int)
 16.92     45.88    18.69 600390363     0.00     0.00  __gnu_cxx::__exchange_and_add(int volatile*, int)
 12.18     59.33    13.45 207564208     0.00     0.00  Ast::tag()
  6.73     66.77     7.44 392894022     0.00     0.00  std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2> const&)
  4.73     71.99     5.22 194555526     0.00     0.00  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_S_right(std::_Rb_tree_node_base*)
  2.81     75.09     3.10 599838760     0.00     0.00  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
  1.63     76.89     1.80 597760346     0.00     0.00  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&)
  1.58     78.63     1.74 192160000     0.00     0.00  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_clone_node(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > 
const*)
  1.25     80.01     1.39 600390363     0.00     0.00  __gnu_cxx::__exchange_and_add_dispatch(int*, int)
  1.23     81.37     1.36 597938347     0.00     0.00  __gnu_cxx::__atomic_add_dispatch(int*, int)
  1.21     82.71     1.34 209242230     0.00     0.00  std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::operator->() const
  1.04     83.86     1.15 192000000     0.00     0.00  RewriteIdentify::ast_has_prefix(icu_52::UnicodeString const&, std::shared_ptr<Ast> const&)
  0.86     84.81     0.95 1198328710     0.00     0.00  __gthread_active_p()
  0.76     85.65     0.84 199744948     0.00     0.00  std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >::_M_valptr() const
  0.74     86.47     0.82    42017     0.00     0.00  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::
shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >*)
  0.74     87.29     0.82     6005     0.00     0.01  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::
shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_copy(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > const*,
 std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >*)
  0.62     87.98     0.69 197855159     0.00     0.00  icu_52::UnicodeString::doCompare(int, int, icu_52::UnicodeString const&, int, int) const
  0.60     88.64     0.66 192160000     0.00     0.00  std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >* std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const,
 std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_create_nod
e<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > const&>(std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > const&)
  0.57     89.28     0.64 392894022     0.00     0.00  std::shared_ptr<Ast>::shared_ptr(std::shared_ptr<Ast> const&)
  0.57     89.91     0.63 394982420     0.00     0.00  std::__shared_ptr<Ast, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
  0.57     90.53     0.63 597938347     0.00     0.00  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy()
  0.57     91.16     0.63 197855159     0.00     0.00  icu_52::UnicodeString::pinIndices(int&, int&) const
  0.55     91.77     0.61 606783091     0.00     0.00  icu_52::UnicodeString::length() const
  0.55     92.38     0.61 192064000     0.00     0.00  std::_Rb_tree_iterator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > >::operator++()
  0.48     92.91     0.53 192318000     0.00     0.00  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:
:shared_ptr<Ast> > >, LessAstPtr, std::allocator<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > > >::_M_destroy_node(std::_Rb_tree_node<std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> > 
>*)
  0.45     93.41     0.50 599164355     0.00     0.00  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
  0.45     93.91     0.50 192549131     0.00     0.00  std::_Rb_tree<std::shared_ptr<Ast>, std::pair<std::shared_ptr<Ast> const, std::shared_ptr<Ast> >, std::_Select1st<std::pair<std::shared_ptr<Ast> const, std:

Heap measurements:





No comments:

Post a Comment